9.4 Encoding the aligner's output

An alignment attribute is added to an existing CWB corpus, which must be the source corpus of the alignment (not the target). There are two steps in this process.

The first step is to declare the new alignment attribute in the source corpus's registry file.

So, find the holmes-en file in the registry directory, and edit it to add the following line:

ALIGNED holmes-de
(note the use of the lowercase spelling of the attribute name!)

This declares an a-attribute linking this corpus to the HOLMES-DE corpus. An a-attribute has the same name as the target corpus.

If you've got the CWB/Perl tools installed, you can use the cwb-regedit to make this change, rather than manually editing the registry. The command would in this case be as follows:

$ cwb-regedit HOLMES-EN :add :a holmes-de

Once the registry file has been updated, the second and final step is to encode the alignment attribute:

$ cwb-align-encode -D holmes.align
(this command runs very fast and prints no output if everything has gone OK).

There is only one argument to cwb-align-encode: the name of the text file containing the alignment data. It is not necessary to name either of the corpora, because the holmes.align file contains both names.

It is, however, always necessary to state where you want the encoded files to be placed. The recommended way to do this is the method shown above: with the -D option. This puts the a-attribute's data files in the same directory used for the corpus's other attributes (as specified in the registry file).

Alternatively, you can specify a different location with the -d option.

Once encoding is complete, it's safe to delete the holmes.align file.

This procedure only creates an a-attribute in HOLMES-EN, linking it to HOLMES-DE. If you also want an a-attribute in HOLMES-DE linking it to HOLMES-DE, you must repeat the procedure with the source and target corpora switched.

You can either re-run the aligner, or re-use the same holmes.align file in “reverse mode”. cwb-align-encode's reverse mode switches the source and target corpora from what is specified in the .align file. This only works, however, provided that there are no crossing beads in your alignment.

That is, first run

$ cwb-regedit HOLMES-DE :add :a holmes-en

and then add the a-attribute data to HOLMES-DE. Assuming that we are re-using the same holmes.align file for this step, as explained above, we need the -R option to engage reverse mode:

$ cwb-align-encode -D -R holmes.align