<s len=9>
<np h="it" len=1> It </np>
is
<np h="story" len=6> the story
<pp h="of" len=4> of
<np h="man" len=3> an old man </np>
</pp>
</np>
.
</s>
[A] in the
show cd; listing): s_len, np_h, np_len,
pp_h, pp_len (cf. Section 1.2)
> <np> a:[] []* </np> :: a.np_h = "bank";
NPs with head lemma bank
an equivalent, but shorter version:
> /region[np,a] :: a.np_h="bank";
or use the match anchor label automatically set to the first token of the match
> <np> []* </np> :: match.np_h="bank";
> <np_h = "bank"> []* </np_h>;
comparison operators = and != are supported, together with
the %c and %d flags;
= is the default and may be omitted
> <np_h="bank"><np_len="[1-6]"> []* </np_len></np_h>;
(or access the value of np_len through a label reference)
<np_h>, <np_len>, ... tags:
> show +np +np_h +np_len;
> cat;
(other corpora may show XML attributes in start tags)
> [(pos="NNS?") & (lemma = _.np_h)];
(recall that np_h would merely return an integer value indicating
whether the current token is contained in a <np> region, not the
desired annotation string)
> /region[np,a] :: int(a.np_len) > 30;
> [np_h="bank"]; does not work!
<np1>, <np2>,
... <pp1>, <pp2>, ...
> [pos="CC"] <np1> []* </np1>;
will only find NPs contained in exactly one larger NP
(use show +np +np1 +np2; to experiment)
<np_h1>, <np_h2>, ..., <pp_len1>, <pp_len2>,
...
> /region[np1, a] :: a.np_h1 = a.np_h within np;
> (<np>|<np1>|<np2>) []* (</np2>|</np1>|</np>);
CQP ensures that a matching pair of start and end tag is picked from the alternatives
> set MatchingStrategy shortest;
> set MatchingStrategy longest;
> set MatchingStrategy standard;
(re-run the previous query after each set and watch out for “duplicate” matches)
> (<np_h "bank">|<np_h1 "bank">|<np_h2 "bank">) []*
(</np_h2>|</np_h1>|</np_h>);