Create word lists for EUROPARL corpus. |
Online Help |
The word list tool
creates a list of all word forms, part-of-speech tags or lemmata in one of the languages and their frequencies. Word lists can be sorted by frequency, alphabetically (lexical) or reverse; they can also be case-folded or normalised (case and diacritics removed).
In addition, items can be filtered with a wildcard expression (using the same CEQL syntax as in simple queries, e.g. ?
to match a single character, *
to for an arbitrary substring and +
for one or more characters), optionally ignoring case and diacritics, and a frequency threshold. For example, set the attribute to lemma, sort order to frequency, language to German, and use *ung
as a filter to obtain a list of -ung nominalisations in the German part of the Europarl corpus.
After clicking Make Word List, the word list is shown in the result frame. Most of the items will be highlighted as links that display the corresponding corpus instances in the context window. The layout of these examples is determined by the display settings in the control frame at the time when the word list was generated. You can page through the corpus examples, show additional context, and change the display settings in the usual way (full description). For some items, translation candidates can be generated by clicking the orange T next to them. Candidates are obtained from the sentence alignment by statistical assocation techniques and are ranked by approximate Dice scores.