4.2 Structural attributes

XML tags match start/end of s-attribute region (shown as XML tags in Figure 1)
> <s> [pos = "VBG"];
> [pos = "VBG"] [pos = "SENT"]? </s>;
$\to$ present participle at start or end of sentence
pairs of start/end tags enclose single region (if StrictRegions option is enabled)
> <np> []* ([pos="JJ.*"] []*){3,} </np>;
$\to$ NP containing at least 3 adverbs
(when StrictRegions are switched off, XML tags match any region boundaries and may skip intervening boundaries as well as material outside the corresponding regions)
/region[] macro matches entire region
/region[np]; $\Longleftrightarrow$ <np> []* </np>;
different tags can be mixed
> <s><np>[]*</np> []* <np>[]*</np></s>;
$\to$ sentence that starts and ends with a noun phrase (NP)
the name of a structural attribute (e.g. np) used within a pattern evaluates to true iff the corresponding token is contained in a region of this attribute (here, a <np> region)
> [(pos = "NNS?") & !np];
$\to$ noun that is not contained in a noun phrase (NP)
built-in functions lbound() and rbound() test for start/end of a region
> [(pos = "VBG") & lbound(s)];
$\to$ present participle at start of sentence
new in CQP v3.4.13: Built-in functions lbound_of() and rbound_of() return the corpus positions of the start/end of a region. Because of technical limitations, the anchor position has to be specified explicitly as a second argument, which will often be the this label:
> [(word = "\d+") & (lbound_of(s, _) = lbound_of(chapter, _))];
$\to$ a number in the first sentence of a chapter
The same query could also be written with an explicit label or anchor reference in a global constraint (which is perhaps easier to read):
> "\d+" :: lbound_of(s, match) = lbound_of(chapter, match);
If the referenced position is not contained in a suitable s-attribute region, the functions return an undefined value, which evaluates to false in most contexts (in particular, all comparisons with this value will be false).
The lbound_of() and rbound_of() functions are mainly used in connection with distance() or distabs(). For example, to find occurrences of the word end within the first 40 tokens of a chapter:
> [word = "end"%c & distabs(_, lbound_of(chapter, _)) < 40];
use within to restrict matches of a query to a single region
> [pos="NN"] []* [pos="NN"] within np;
$\to$ sequence of two singular nouns within the same NP
most linguistic queries should include the restriction within s to avoid crossing sentence boundaries; note, however, that only a single within clause may be specified
query matches can be expanded to containing regions of s-attributes
> A = [pos="JJ.*"] ([]* [pos="JJ.*"]){2} within np;
> B = A expand to np;
one-sided expansion is selected with the optional left or right keyword
> C = B expand left to s;
the expansion can be combined with a query, following all other modifiers
> [pos="JJ.*"] ([]* [pos="JJ.*"]){2} within np cut 20 expand to np;