Search EUROPARL corpus with CQP query.

Online Help

[separator]

The Europarl CQP Demo interface uses two browser windows. The main window is split into a control frame at the top (where you can enter a corpus query and set some options) and a larger result frame (where query results are displayed). A separate context window will pop up when the links for extended match context are clicked; the context window is also used for some other auxiliary information. If you run multiple CQP Demo sessions in parallel, they will share the same context window.

The control frame: Type a CQP query into the text field at the top left of the control frame, and select a language with the popup menu next to it. A detailed description of the query language and copious examples can be found in the CQP Tutorial (online version). There are also some examples at the bottom of this help page. The sort order option determines the ordering of query results: unsorted (i.e. in corpus order), randomised, or in various lexical orderings (ascending, descending, reverse). The latter can be based on word forms, lemma (base forms), or part-of-speech tags. The default normalised sort order (ignoring case and accents) can be deactivated with the checkbox to the right. Press the Run Query button to execute the query. Note that result sets are limited in size and the corpus search will stop after the first 50,000 matches). The display options and keyword controls beneath the query field are described below, as are the buttons for frequency distributions.

Tagset information: EnglishGermanFrenchSpanish

[separator]

The result frame: When a query has been executed, the matching strings are displayed together with some context (by default a complete sentence) and the aligned regions in all selected languages. The matching string itself is printed in bold face and highlighted with a yellow background. Each match is preceded by a header line showing the match number at the left, followed by date and speaker information (if available). Clicking on the context link in the left margin will display a larger amount of context in the context window, again showing alignments for all selected languages. If there are more than 20 matches, they are displayed in pages of 20 items each. The navigation bars at the top left and bottom left allow you to step through the individual pages. Click the << and >> buttons to jump back and forth by an entire page (20 matches), respectively, or < and > to jump back and forth by half a page (10 matches). Click on < to go back to the first page and < to jump to the last page. You can also select a page from the drop down menu in the middle of the navigation bar and jump directly to this page by clicking the Go button.

The display options allow you to customise the information shown in the result frame. Note that changes in the display options only take effect when the query is re-run (query results are cached, so they can be re-displayed immediately). Alternatively, you can set display options using the menus in the top right or bottom right corner of the result frame and activate the new settings by clicking the Apply button (changes made here will be undone when a new query is executed). The leftmost display menu selects the information shown for tokens, allowing a choice of word forms with or without part-of-speech tags as subscripts, and base forms (lemma) with POS tags. The middle menu determines the amount of context shown around matches in the source language. By default, the context consists of the full sentence containing the match (sentence). It can be extended to include the preceding and following sentence (2 sentences) or a total of ten sentences (10 sentences). Alternatively, complete paragraphs or speaker turns can be displayed. The context choice does not affect target languages, which will always display the smallest unit aligned to each match. The checkboxes on the right select languages for display. Note that the current source language is always activated implicitly.

[separator]

The keyword controls: (NB: You may have to scroll or resize the control frame to see the keyword controls.) When the box at the beginning of this line is checked, CQP will execute a set keyword command after running the query. The layout was designed to mirror the corresponding syntax in the CQP query language as closely as possible (see the CQP Tutorial for details). As an example, search for a noun in the corpus and make the following keyword settings: (1) check the keyword box; (2) leave the first menu at nearest; (3) type pos="VV.*" into the text field; (4) leave the next menu blank; (5) set the following menu to 1 s; (6) leave the last menu at match and the final box unchecked. When you click the Run Query button, the closest full verb to each instance of the noun will be underlined in red. Only verbs within the same sentence are considered in the search operation.

Match frequencies: If you click the Frequencies button instead of Run Query, a list of distinct query matches and their corpus frequencies will be displayed in the result frame. Ordering and normalisation of this list can be controlled with the sort order options. Note that it is also possible to count lemmata or even (patterns of) POS tags in this way. Click on any of the strings to show the corresponding matches in the context window (you will find a navigation bar and display options at the bottom of this page). Frequency lists respect the currently selected sort order and options. In particular, matching strings will be normalised if the sort normalised box is checked.

Corpus distribution: Click on the Distribution button to show the distribution of query matches across years in the left part of the result frame, and the distribution according to the tongue of the respective speaker in the right part. Note that speaker tongue is often unspecified, so it is difficult to make use of this information. The bars in the left part are scaled to account for the slightly different number of tokens in each year. The blue percentage value at the end of a bar compares realtive frequency in a year to the average, so values above 100% indicate above-average frequency (search for "coffee" to see a striking example). Clicking on one any the labels will switch the result frame to show the corresponding matches. Press your browser's Back button to return to the distribution window.

[separator]

Example queries: You can copy & paste these queries into the text field in the top left corner of the control frame.

English:

    [pos = "JJ.*"] [lemma = "energy"]
    
    "(over|under)[a-z]+ion" %

    ([pos = "IN|TO"] [pos = "DT.*"]? [pos = "JJ.*"]* [pos = "N.*"]+){3}

    [pos = "IN|TO"] a:[pos = "N.*"] [pos = "IN|TO"] b:[pos = "N.*"] :: a.lemma = b.lemma
    

German:

    "(Kern|Atom)kraft.*"
    
    [lemma = "Gesetz"]
    
    "sehr"%c @[pos = "ADJA"] [pos = "NN"]
    
    "von"%c [pos = "ART"]? [pos = "ADJA"]* [pos = "N.*"]+ "wegen"
    

[separator]