Publications and references

Citing CWB

If you use the CWB in your own research or as part of a software package, please acknowledge our work by citing one of the key publications, as specified below.

You may also find it useful to provide a reference URL (e.g. in a footnote). The appropriate URL for CWB is:

The IMS Open Corpus Workbench. URL: https://cwb.sourceforge.io/

Key publications

Our "key" references are the standard citations that can be cited when CWB is used in academic research. Ideally, every paper that makes use of CWB should give one or more of the key references. The other papers listed below may be cited when their content is relevant, but are not "key" in this sense.

The current standard reference on the design and functionality of CWB is the following article:

Evert, Stefan and Hardie, Andrew (2011). Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics 2011 conference, University of Birmingham, UK.
This paper expands on the presentation made by the CWB lead developers at the CL2011 conference in Birmingham in July 2011. As well a very basic introduction, there are explanations of the changes made for versions 3.0 to 3.4. The link above is to a copy of the paper hosted at the University of Birmingham; if that link becomes inaccessible, click here to go to the mirror of the paper on this website.

The standard reference for CQPweb is the following:

Hardie, Andrew (2012). CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17 (3): 380-409. [alternative link]

The present standard reference on Ziggurat is the following:

Evert, S. and Hardie, A. (2015). Ziggurat: A new data model and indexing format for large annotated text corpora. In Proceedings of the 3rd Workshop on the Challenges in the Management of Large Corpora (CMLC-3), pages 21–27, Lancaster, UK. (PDF)

Other papers

These are the original references for the IMS Corpus Workbench (as distributed in 1997). These papers are seriously out of date now and should only be cited for historical completeness or to credit the original developers.

Christ, Oliver (1994). A modular and flexible architecture for an integrated corpus query system. In Papers in Computational Lexicography (COMPLEX '94), pages 22–32, Budapest, Hungary.
Christ, Oliver (1994). Chapter 2 of Corpus administrator's manual. Technical report, IMS, University of Stuttgart.
Christ, Oliver and Schulze, Bruno M. (1996). Ein flexibles und modulares Anfragesystem für Textcorpora. In H. Feldweg and E. W. Hinrichs, editors, Lexikon und Text, pages 121–133. Max Niemeyer Verlag, Tübingen.

Not by us, but of relevance: CWB version 1 to 3's implementation of corpus storage and index formats closely follows the first edition of Witten et al. (1999).

Witten, Ian H.; Moffat, Alistair; Bell, Timothy C. (1999). Managing Gigabytes. Morgan Kaufmann Publishing, San Francisco, 2nd edition.

Other useful references:

Hoffmann, Sebastian; Evert, Stefan; Smith, Nicholas; Lee, David; Berglund Prytz, Ylva (2008). Corpus Linguistics with BNCweb – a Practical Guide, volume 6 of English Corpus Linguistics. Peter Lang, Frankfurt am Main.
Chapter 12 gives a concise, beginner-level introduction to the CQP query language with many examples and exercises. This is a good alternative to the CQP Query Language Tutorial, which can be used together with the online version of BNCweb for practical exercises. In addition, a very gentle introduction to simple queries in CEQL syntax (used by various Web interfaces) can be found in Chapter 6.

Recent and significant conference presentations

Evert, Stefan and Hardie, Andrew (2021). Ziggurat v0.1: A next-generation system for modelling, storing, and retrieving corpus (and other) data. Presentation at Corpus Linguistics 2021, Limerick (online). [Slides] [Video on YouTube]
Hardie, Andrew (2021). Extensibility as a focus for corpus analysis software: The CQPweb plugin framework. Presentation at Corpus Linguistics 2021, Limerick (online). [Slides] [Video on YouTube] [Transcript]
Evert, Stefan and Hardie, Andrew (2011). Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. Presentation at Corpus Linguistics 2011, University of Birmingham, UK.
Evert, Stefan (2008). Inside the IMS Corpus Workbench. Presentation at the IULA, Universitat Pompeu Fabra, Barcelona, Spain. Includes a historical overview of the CWB project.