modelling of complex word sequences with regular expressions over
patterns (i.e. tokens): every [...] expression is
treated like a single character (or, more precisely, a character set) in a
conventional regular expression
token-level regular expressions use a subset of the POSIX syntax
repetition operators:
? (0 or 1), * (0 or more), + (1 or more),
{} (exactly ), {,} ()
grouping with parentheses: (...)
disjunction operator: | (separates alternatives)
parentheses delimit scope of disjunction:
(alt|alt| ... )
Figure 2 shows simple queries matching prepositional
phrases (PPs) in English and German. The query strings are spread over
multiple lines to improve readability, but each one has to be entered on a
single line in an interactive CQP session.
Figure 2:
Simple queries matching PPs in English and German.