The Collocation Tab
On this tab you can run statistical calculations against the current
set of concordance lines you have on the Concordance Tab.
The Display
The tab is divided into two areas. On the right the table of collocational
data is displayed, and on the left are all the configuration options you can
set to influence your calculation.
The Actions
Calculation Options
- Evaluation Function
- There is a selection of different evaluation functions availble that
you can choose from to analyse the data.
- mutual information, the classical formula as initially
described by Church & Hanks.
- mi (double), mutual information with a weighting factor
to boost the scores of more frequent words. This is to counterbalance
the fact the mutual information often comes up with freak words as
collocates due to their rareness.
- mi (triple), mutual information with an even higher weight
for frequent words.
- t-score, the statistical t-test, as adapted by Church &
Hanks. Scores better on high frequency words than the mutual
information metric.
- z-score, a rather old but tested metric.
- observed/expected, the ratio of observed frequency over
expected frequency.
- log-likelihood and
- modified log-likelihood, a measurement proposed by Adam
Kilgarriff and Ted Dunning.
- Enhancer
- You have two options to enhance your calculation.
- Direction
- for each word a distinction is made between its occurrences on the
left or on the right of the node. This can often unveil asymmetrical
distribution, when one side-occurrence gets a higher score than the
combined (both sides).
- Case Fold
- When this option is selected, all words are collated to upper case.
This is useful for combining accidentally capitalised words (eg at
the beginning of a sentence), but can blur the distinction between
proper nouns and common nouns.
- Environment
- Two parameters to set here:
- Span
- The span defines how many words are taken into account for
collecting the collocates on either side. The extent of the span
can be chosen seperately for the left and the right.
- Cut-Off
- The cut-off allows you to exclude rare and freak words from the
calculation. Words that occur rarely are often typographical errors
or weird names, which usually get high scores due to their rareness.
All words that do not occur more often than the specified value are
ignored for the purpose of the calculation.
- Calculate
- When you have set all the options you want, click calculate
to produce a table of collocations.
The Collocation Table
The table of collocations has five columns of information.
- #
- The numeric rank of that collocate according to the score allocated
by the evaluation function.
- Collocate
- The word occuring as a collocate.
- Joint Frequency
- The number of times that the collocate and the node
word occur together within the span.
- Total Frequency
- The total number of times that the collocate appears
within the corpora selected.
- Score
- The score allocated by the evaluation frequency.
Sorting
When the table of collocations is
created it is sorted according to the scoring method of the evaluation
function. However, once the table has been created, you can sort it according
to any column by clicking on the respective column header bar. If you
press shift and click on the column header bar, the table will be sorted in
reverse order.
You can therefore sort the collocational data as follows:
- according to joint or total frequency, in normal or reverse order.
- in alphabetical or reverse alphabetical order ( with the
collocate column header bar)
- in numerical or reverse numerical order (with the # column
header bar).
Back to the index page