| word | word forms (“plain text”) |
| pos | part-of-speech tags (Penn Treebank tagset) |
| lemma | base forms (lemmata) |
| novel | individual novels |
| novel_title | title of the novel |
| book | when text is subdivided into books |
| book_num | number of the book |
| chapter | chapters |
| chapter_num | number of the chapter |
| chapter_title | optional title of the chapter |
| title | encloses title strings of novels, books, and chapters |
| p | paragraphs |
| p_len | length of the paragraph (in words) |
| s | sentences |
| s_len | length of the sentence (in words) |
| np | noun phrases |
| np_h | head lemma of the noun phrase |
| np_len | length of the noun phrase (in words) |
| pp | prepositional phrases |
| pp_h | functional head of the PP (preposition) |
| pp_len | length of the PP (in words) |
| word | word forms (“plain text”) |
| pos | part-of-speech tag (STTS tagset) |
| lemma | base forms (lemmatised forms) |
| alemma | ambiguous lemmatisation (feature set, see examples in Section 6.6) |
| agr | noun agreement features (feature set, see examples in Section 6.6) |
Each agreement feature has the form ccc:g:nn:ddd with
| ccc | = | case | (Nom, Gen, Dat, Akk) |
|---|---|---|---|
| g | = | gender | (M, F, N) |
| nn | = | number | (Sg, Pl) |
| ddd | = | determination | (Def, Ind, Nil) |
| <s> | sentences |
| <pp> | prepositional phrases |
| <np> | noun phrases |
| <ap> | adjectival phrases |
| <advp> | adverbial phrases |
| <vc> | verbal complexes |
| <cl> | subclauses |
<s len="..">
<pp f=".." h=".." agr=".." len="..">
<np f=".." h=".." agr=".." len="..">
<ap f=".." h=".." agr=".." len="..">
<advp f=".." len="..">
<vc f=".." len="..">
<cl f=".." h=".." vlem=".." len="..">
len = length of region (in tokens)
f = properties (feature set, see next page)
h = lexical head of phrase (<pp_h>: “prep:noun”)
agr = nominal agreement features (feature set, partially disambiguated)
vlem = lemma of main verb
| <np_f> | norm (“normal” NP), ne (named entity), |
| rel (relative pronoun), wh (wh-pronoun), pron (pronoun), | |
| refl (reflexive pronoun), es (es), sich (sich), | |
| nodet (no determiner), quot (in quotes), brac (in parentheses), | |
| numb (list item), trunc (contains truncated nouns), | |
| card (cardinal number), date (date string), year (specifies year), | |
| temp (temporal), meas (measure noun), | |
| street (address), tel (telephone number), news (news agency) | |
| <pp_f> | same as <np_f> (features are projected from NP) |
| + nogen (no genitive modifier) | |
| <ap_f> | norm (“normal” AP), pred (predicative AP), |
| invar (invariant adjective), vder (deverbal adjective), | |
| quot (in quotes), pp (contains PP complement), | |
| hypo (uncertain, AP was conjectured by chunker) | |
| <advp_f> | norm, temp (temporal adverbial), loc (locative adverbial), |
| dirfrom (directional source), dirto (directional path) | |
| <vc_f> | norm, inf (infinitive), zu (zu-infinitive) |
| <cl_f> | rel (relative clause), subord (subordinate clause), |
| fin (finite), inf (infinitive), comp (comparative clause) |