4.1 Using labels

patterns can be labelled (similar to the target marker @)
> adj:[pos = "JJ.*"] ... ;
the label adj then refers to the corresponding token (i.e. its corpus position)
label references are usually evaluated within the global constraint introduced by ::
> adj:[pos = "ADJ."] :: adj < 500;
$\to$ adjectives among the first 500 tokens
annotations of the referenced token can be accessed as adj.word, adj.lemma, etc.
labels are not part of the query result and must be used within the query expression (otherwise, CQP will abort with an error message)
labels set to optional patterns may be undefined
> [pos="DT"] a:[pos="JJ"]? [pos="NNS?"] :: a;
$\to$ global constraint a is true iff match contains an adjective
to avoid error messages, test whether a label is defined before accessing its attributes
> [pos="DT"] a:[]? [pos="NNS?"] :: a -> a.pos="JJ";
(-> is the logical implication operator $\to$ , cf. Section 2.6)
labels are used to specify additional constraints that are beyond the scope of ordinary regular expressions
> a:[] "and" b:[] :: a.word = b.word;
labels allow modelling of long-distance dependencies
> a:[pos="PP"] []{0,5} b:[pos = "VB.*"]
:: b.pos = "VBZ" -> a.lemma = "he|she|it";
(this query ensures that the pronoun preceding a 3rd-person singular verb form is he, she or it; an additional constraint could exclude these pronouns for other verb forms)
labels can be used within patterns as well
> a:[] [pos = a.pos]{3};
$\to$ sequences of four identical part-of-speech tags
however, a label cannot be used within the pattern it refers to; use the special this label represented by a single underscore (_) instead to refer to the current corpus position
[_.pos = "NPS"] $\Longleftrightarrow$ [pos = "NPS"]
the this label can also be used to constrain tokens to a certain range of corpus positions without explicit labels, e.g.
> [pos = "ADJ." & _ < 500];
such constraints are not allowed in query-initial position, so queries such as [_ >= 666]; and [_ < 500 & pos = "ADJ."]; will be rejected as invalid
new in CQP v3.4.17: as a special case, the pattern
> [_ = 666];
can be used to look up a known corpus position efficiently
the built-in functions distance() and distabs() compute the (absolute) distance between 2 tokens (referenced by labels)
> a:[pos="DT"] [pos="JJ"]* b:[pos="NNS?"] :: distabs(a,b) >= 5;
$\to$ simple NPs containing 6 or more tokens
the standard anchor points (match, matchend, and target) are also available as labels (with the same names)
> [pos="DT"] [pos="JJ"]* [pos="NNS?"] :: distabs(match, matchend) >= 5;
various other built-in functions have been added in recent versions of CQP and can be used with label references or directly with attribute values; see Sec. 8.3 for a complete list
new in CQP v3.4.17: use function strlen() to filter by word length, e.g. to find particularly long words:
> [word = ".*ment" & strlen(word) >= 16];
NB: inequality comparisons (>, >=, <, <=) are only allowed for integers (corpus positions, string lengths, etc.), but not for strings and regular expressions; CQP versions before v3.4.17 used to silently accept and misinterpret such inequality comparisons