CQPweb
About CQPweb
CQPweb is a web-based graphical user interface (GUI) for some elements of the CWB - and in particular, the CQP query processor. Thus the name.
CQPweb is designed to replicate the user-interface of the popular BNCweb tool, which also uses CQP as a back-end. Like BNCweb, CQPweb uses a database alongside the CWB to provide extra functions beyond those built into CWB/CQP. However, unlike BNCweb, CQPweb can be used with any corpus.
CQPweb is especially suitable for students, non-linguists, and others for whom a Unix-like command-line is a terrifying prospect.
CQPweb can be used in three ways.
- Via a public server. There are many of these out there; the one run by Andrew Hardie, CQPweb's main developer, is https://cqpweb.lancs.ac.uk.
- By getting a copy of the code and installing it directly on your own computer. See below.
- By downloading CQPwebInABox, a Virtual PC which has CQPweb pre-installed (with two sample corpora included!): see our page on CQPwebInABox.
CQPweb features
Here is a quick summary of the extra functionality provided by CQPweb, beyond what you can do with CQP alone (and not mentioning, of course, the convenience of the web-GUI):
- The CEQL simple query language. Built in, and enabled by default in CQPweb, this simplifies the regular expression language of CQP and makes its powerful search syntax more accessible for beginners. CEQL gives access to lemma and POS information, if available in the corpus, via the easy shortcuts of word_POSTAG and {lemma}.
- Caching. All CQPweb queries are cached - so if you run the same query again the results are found much faster. Very convenient if you have a class of 30 students all running the same queries within the space of a one-hour workshop!
- Query sorting. Queries can be sorted by adjacent words (or tags) going left or right; queries can also be thinned on the basis of words (or tags) appearing at specific positions relative to the query “hit”. Random sorting is also available.
- Collocations. A quick quantitative summary of the neighbourhood of the query results in the corpus. A range of different collocation statistics are supported, including Z-score, Mutual Information and Log-likelihood, and collocation can be done on tags as well as words.
- Distribution. See how the results for a query are distributed across the text categories in the corpus. You can then thin a query to just the results in a particular text category.
- Multiple query postprocesses. Any thinned query you create with the distribution, sort, or collocation functions (postprocesses) can in turn be postprocessed. There is no limit to how many different postprocesses can be applied to a query - and all querues can be saved in a user's own space.
- Manual annotation. Classify the hits of a query manually using the Categorise function. Then filter results according to how you have annotated them to create .
- Context. View up to several hundred words of context each way in the original, underlying text.
- Left-to-right support. Alphabets such as Arabic and Hebrew can be displayed correctly.
- Text metadata. Every CQPweb corpus has a database of information on each text. This can include categorisation schemes (e.g. “Written texts versus spoken texts” or “What decade was the text written in?&rdquo as well as bibliographical information (e.g. author, title, etc.).
- Query history. Each user has an individual query history list, showing all the queries they have run. This makes it easy to keep track of what you have been doing or to resurrect old queries without the bother of re-typing them!
- Subcorpora. A subset of texts can be defined on the basis of metadata or by taking all texts which have a “hit” for a particular query. Then you can see frequency lists for the subcorpus, or run queries which search just within that subcorpus.
- Keywords. Quickly get lists of keywords based on comparing subcorpus frequency lists. You can also use the frequency list of another corpus on the system as the reference corpus. Key tags (e.g. POS tags, semantic tags) are also supported.
- User corpora. Users can upload their own corpus data, annotate it, and share it with colleagues who have accounts on the same CQPweb server.
In addition, CQPweb offers the following extra tools for the system administrator:
- Simplified indexing. The admin interface provides a quick-and-easy set of web-forms for indexing a corpus without having to log on to the command line and use the CWB utilities.
- User management. Easily create and manage user accounts; assign users to groups; and manage the access rights granted to each user or group - every corpus on the system can have a different configuration of users/groups given access to it.
- Access limits. Configurable on a per-corpus basis: you can restrict the amount of context a user can view. This is especially useful if you can't give your users access to the full text of a given corpora for licensing or other copyright/ethical reasons.
- Configurable query language. Configurable on a per-corpus basis: the CEQL syntax for accessing POS tags and lemmata can be re-targeted to whatever alternative tags your corpus may have.
CQPweb is still being extended and several new features are in the experimental stage. Expect the feature list to grow over time!
How to install
Two versions of CQPweb are presently available (see below for further info on versions).
- The stable version 3.2 (if you want as few bugs as possible): click here to download
- The cutting-edge version 3.3 (if you want all the new features): get it from the SourceForge repository as described here.
- Installation instructions can be found in the system administrator's manual.
Older versions
The main versions of CQPweb that have been released are listed below. Some are still in use by various sites. You can find out what version is running on a given server by looking at the footer of any CQPweb page on that server.
-
Version 3.3: The current development version, known to still contain bugs. V 3.3.15 is (as of May 2022) the latest version.
Requires the most recent available version of the core. -
Version 3.2: The current stable version, recommended for most users. V 3.2.43 is (as of 2021) the latest version with the most bugfixes.
Requires at least v3.4 of the core, ideally not an old one. - Version 3.1: While still used by some servers as of 2020, this series is now obsolete and unsupported.
- Version 3.0: Obsolete.
- Version 2.x: Obsolete.
- Version 1.x: Obsolete.
A full changelog history for CQPweb is contained within its own code; look at Latest news on the main corpus menu of any live server, or at the file lib/info-forms.php
.
Read more
To learn more about CQPweb, see:
- Hardie, Andrew (2012). CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17 (3): 380-409. [alternative link]
- The tutorial videos on our YouTube channel.
- Also of interest: Peter Uhrig's guide to installing CQPweb on Windows Subsystem for Linux