First, let's introduce the tutorial data we'll be working with. All the files mentioned here are available as part of the data package provided alongside the CWB Encoding Manual. The corpus we'll use to practice alignment consists of a very short excerpt from the novel The Hound of the Baskervilles by Arthur Conan Doyle, which we'll call the Holmes corpus after the main character. As well as the original English, we have a German translation of the same text. We'll use the CWB labels HOLMES-EN for the source corpus and HOLMES-DE for the target corpus (i.e. the translation) respectively. Using language codes to distinguish components of a parallel corpus in this way is a useful way to organise labels for aligned corpora in CWB.
Before going any further, you should index these two corpora, using the following commands:
$ cwb-encode -d /corpora/data/example -c utf8 -f holmes_en.vrt -R /usr/local/share/cwb/registry/holmes-en -P pos -P lemma -S s+id -S p+num $ cwb-encode -d /corpora/data/example -c utf8 -f holmes_de.vrt -R /usr/local/share/cwb/registry/holmes-de -P pos -P lemma -S s+id -S p+num
(you should, of course, amend the -d and -R options to suit your own setup).
All the example commands given in the following sections are based on these two corpora. They do not include the -r option to specify the registry directory location. If you have placed the registry files for the two corpora anywhere other than the default registry, you will need either to add the -r option, or else to use the CWB_REGISTRY environment variable.