- new in CQP v3.4.32: match selectors allow a subpart of query to be returned as the final match
- More sophisticated CQP queries often include additional material before and after the actual matching range of interest as “context filters”. Consider e.g. an it-extraposition construction such as it is the time that have changed, which can be matched by the query
> It = "it"%c [lemma="be"] [pos="DT"] @[pos="NNS?"] "that"%c [pos="V.*"];
- We may only be interested in the nouns that are emphasised in this way, but have to include the entire construction in the match to identify relevant contexts. In an interactive CQP session, we can use the target marker to obtain a frequency count with group (Sec. 3.4) or adjust the matching range with set (Sec. 3.7). However, this is not possible if CQP queries are executed via a Web interface such as CQPweb.
- A match selector is introduced by the keyword show and allows a subpart of the query to be extracted as the matching range, based on labels marking the first and last token of this range. In our example, this also frees the target marker to be set on the verb (which will no longer be part of the match).
> "it"%c [lemma="be"] [pos="DT"] noun:[pos="NNS?"] "that"%c @[pos="V.*"]
show noun .. noun;
- Specify match as the first label in order to leave the start of the match unmodified and/or matchend as the second label in order to leave the end unmodified. No other anchor names are allowed in the range specification. The trivial match selector show match..matchend has no effect, whereas show matchend..matchend is not accepted by the query parser.
- If one of the specified labels is undefined, the corresponding match will be discarded. The query
> [pos="DT"] adj:[pos="JJ.*"]? [lemma="nail"] show adj .. matchend;
will only return matches that include the optional adjective. Similarly, any invalid ranges (where the specified end token precedes the start token) will be discarded. The following query produces an empty result:
> start:[pos="DT"] [pos="JJ.*"]? end:[lemma="nail"] show end .. start;
- Make sure that labels used in the match selector are always defined, especially in queries with disjunctions (each label must be set in all of the alternative branches). If you forget to set stop: in one of the branches below, it will effectively be erased from the query.
> "from" (stop:[pos="PP"] | [pos="DT"] stop:[pos="NNS?"]) "to" show match..stop;
- Optionally, an offset can be added to the start and end position of the selector. This is convenient in order to shift the start and end of the matched by a fixed number of tokens, e.g.
> "from" ([pos="PP"] | [pos="DT"] [pos="NNS?"]) "to" show match[1]..matchend[-1];
It is also convenient to adjust a label that cannot easily be set on the desired position, especially on the last token of an s-attribute region or a group of alternatives. The first version of this query can be rewritten in a less error-prone way as
> "from" ([pos="PP"] | [pos="DT"] [pos="NNS?"]) stop:"to" show match .. stop[-1];
If the offset puts the start or end point outside the valid corpus range, the match is discarded.
- The show clause has to be written in a specific position: after a global constraint (::) and within clause, but before any alignment constraints (Sec. 5.2), cut limit or expand clause.