<s len=9> <np h="it" len=1> It </np> is <np h="story" len=6> the story <pp h="of" len=4> of <np h="man" len=3> an old man </np> </pp> </np> . </s>
[A]
in the
show cd; listing): s_len
, np_h
, np_len
,
pp_h
, pp_len
(cf. Section 1.2)
> <np> a:[] []* </np> :: a.np_h = "bank";
NPs with head lemma bank
an equivalent, but shorter version:
> /region[np,a] :: a.np_h="bank";
or use the match anchor label automatically set to the first token of the match
> <np> []* </np> :: match.np_h="bank";
> <np_h = "bank"> []* </np_h>;
comparison operators = and != are supported, together with
the %c
and %d
flags;
= is the default and may be omitted
> <np_h="bank"><np_len="[1-6]"> []* </np_len></np_h>;
(or access the value of np_len
through a label reference)
<np_h>
, <np_len>
, ... tags:
> show +np +np_h +np_len;
> cat;
(other corpora may show XML attributes in start tags)
> [(pos="NNS?") & (lemma = _.np_h)];
(recall that np_h
would merely return an integer value indicating
whether the current token is contained in a <np>
region, not the
desired annotation string)
> /region[np,a] :: int(a.np_len) > 30;
> [np_h="bank"];
does not work!
<np1>
, <np2>
,
... <pp1>
, <pp2>
, ...
> [pos="CC"] <np1> []* </np1>;
will only find NPs contained in exactly one larger NP
(use show +np +np1 +np2;
to experiment)
<np_h1>
, <np_h2>
, ..., <pp_len1>
, <pp_len2>
,
...
> /region[np1, a] :: a.np_h1 = a.np_h within np;
> (<np>|<np1>|<np2>) []* (</np2>|</np1>|</np>);
CQP ensures that a matching pair of start and end tag is picked from the alternatives
> set MatchingStrategy shortest;
> set MatchingStrategy longest;
> set MatchingStrategy standard;
(re-run the previous query after each set and watch out for “duplicate” matches)
> (<np_h "bank">|<np_h1 "bank">|<np_h2 "bank">) []*
(</np_h2>|</np_h1>|</np_h>);