- the additional keyword anchor can be set after query
execution by searching for a token that matches a given search
pattern (see Figure 3)
Figure 3:
The set target command.
|
- example: find noun near adjective modern
> A = [(pos="JJ") & (lemma="modern")];
> set A keyword nearest [pos="NNS?"] within right 5 words from match;
- keyword should be underlined in KWIC display (may not work on some terminals)
- search starts from the given anchor point (excluding the anchored token
itself), or from the opening and closing boundaries of the match if
match is specified
- with inclusive, search includes the anchored token, or the
entire match, respectively
- from match is the default and can be omitted
- the match and matchend anchors can also be set,
modifying the actual matches9
- Anchor positions can also be copied, possibly modifying the matching
ranges. In line with the complex form of the set target command
described above, the first anchor is the destination and the second the source:
set A target match;
set A matchend keyword;
- or they can be deleted from the named query result (keyword and target only, of course):
set A keyword NULL;
set A target NULL;
- an important use case is adjustments to the matching range if a query
needs to include additional context as a filter. For example, this query
attempts to identify noun phrases in object position (cf. Sec. 4.3),
then uses set to cut off the context filter before the NP10:
> NPobj = [pos="V.*"] [pos="RB"]* <np> @[] []* </np>;
> set NPobj match target;
> set NPobj target NULL;
- new in CQP v3.4.31: the undocument and somewhat inconsistent behaviour
of the copy operation in the case of undefined or invalid anchor
positions 11 has been consolidated and is described precisely below. The reimplementation
also provides enhanced functionality: an offset can be added to shift
anchors to the left or right.
- Offsets are convenient if a fixed number of extra tokens of context have
been matched and need to be cut off, or to shift a target anchor that could
not easily be set at the desired position. A typical example is an anchor on
the last token of a group of alternatives
(...|...|...)
, which would
have to be set in each branch of the group (and possibly multiple times if
there are optional elements at the end of a branch). It is much easier to
set the anchor on the following token (possibly with a zero-width marker
@[::]
, cf. Sec. 8.1), and then to
shift it one token to the left afterwards:
> set A target target[-1];
- The new implementation performs a conditional update of the
destination anchor by default. If the source anchor is undefined (i.e. -1)
or if the offset puts it outside the valid corpus range, the
destination remains unmodified.12 The same
happens if the match or matchend anchor is modified and
would result in an invalid matching range (with matchend
match).
- the example below extends the match elephant(s) to start from the
keyword anchor, but only if it is defined and not to the right of
the match:
> Elephants = [lemma = "elephant"];
> set Elephants keyword leftmost [pos="JJ.*"] within 3 words;
> set Elephants match keyword;
- append an exclamation mark ! for a forced update, in
which an undefined (or out-of-range) source anchor always overwrites the
destination; if the destination is match or matchend, the
corresponding item is dropped from the query result. Retry the commands
above with
> set Elephants match keyword !;
- as a consequence of these rules, creating a query result that consists
of the single target token requires a sequence of three commands
in order to work reliably:
> Elephants = [lemma = "elephant"];
> set Elephants target nearest [pos="JJ.*"] within 3 words;
> set Elephants match target;
# S1
> set Elephants matchend target;
# S2
> set Elephants match target !;
# S3
Quiz: Can you explain why without looking up the solution in the footnote?13