modnlp-tec: on-line Translational English Corpus
Start The TEC concordancing tool by clicking on the icon below
The TEC corpus browser uses JavaTM
Web Start technology. If the tool fails to start when you click on the icon above,
you might have to download or upgrade Java on your machine. See
the Java Web Site for details.
N.B.: If the link above fails to start, try
downloading the latest
version of modnlp-teccli (this link), uncompressing it, and
running teccli.jar (by clicking on it, for instance).
Once the application starts, you will see a dialogue similar to the
one shown on the right.
Select 'Choose new remote corpus' and
enter ronaldo.cs.tcd.ie:1240 into the window that will
pop up. After a few seconds, the concordancer should appear. If you are behind a firewall, set the proxy server by selecting "Options" on the menu bar.
Quick tutorial
In addition to the standard concordancer, the browser implements a few
other tools. It allows you to restrict your search to sub-corpora
defined according to certain features, to display a summary of the
files contained in the TEC corpus. and to display frequency lists for
the various selectable sub-corpora.
Selecting sub-corpora
The sub-corpora selection tool allows you to restrict the results of
concordancing queries and the contents of frequency tables to sections
of files matching certain selection criteria. These criteria can be,
for example, author, translator, translator gender, source language,
translator nationality, etc. In order to select a sub-corpus, choose
"Options->Select sub-corpus...".
A window similar to the one shown below should appear.
The menu boxes allow you to select one or more items describing texts
to be include in the desired sub-corpus. The menu boxes can be
connected so as to form the logical expressions which ultimately
determine what gets included or excluded. The 'exclude' checkbox below
the menu boxes cause the items selected in the box above it to be
excluded.
Clicking 'OK' activates the sub-corpus selection. In order to
de-activate it (that is, allow search on the full corpus), choose
"Options" and de-select "Activate sub-corpus".
Displaying a frequency list
Select "Plugins->Word Frequency List". The following window will
appear.
Select the range of ranked items to display (default is display the
500 most common terms) and click on "Get List" to retrieve their
frequency table. This table can be saved to a CSV file which you can,
if you like, manipulate through a spreadsheet software.
Displaying general corpus information
Select "Plugins->Corpus Description Browser". A window will appear
which contains a list of each file in the corpus, the major sub-corpus
they belong to (i.e. fiction, newspapers, biography, and in-flight
magazines), the number of tokens they contain and their type-token
ratios. At the bottom of the window you will see the total number of
tokens in the corpus and the overall type-token ratio.