8 KWIC concordance
There is a complete KWIC concordance based on the tagged LOB Corpus. The concordance was available on tape and microfiche (Hofland and Johansson 1986),but is no more available
8.1 Tapes and files
Code: |
ASCII |
Tracks |
9 |
Density |
1600 or 6250 |
Label: |
none |
Parity: |
odd |
Files: |
95 (see the index in 8.6) |
EOF marks |
1 after each file, 2 at end of tape |
Record size: |
132 |
Blocking factor: |
60 |
Column |
Contents: |
1-7 |
Reference number (see 2.5) |
8 |
Blank |
9-61 |
Lefthand context |
62-132 |
Keyword + righthand context |
8.3 Sorting
In the original LOB Corpus concordance (Hofland and Johansson 1979) the arrangement is by graphic word, and the sorting under each key word is alphabetical according to the righthand context. In the new concordance the occurrences under each graphic word are sorted in the first instance by tag and, under each tag, according to the context: tag of key word +1, spelling of key word +1, tag of key word +2, spelling of key word +2. From key word +3 the sorting is by spelling only. The sorting sequence among the characters is according to ascending Ascii order: (!"#$%&'()* +,-./0-9:;<=>?@A-Z[\]^-'{|}, no separation between upper and lower case). Punctuation marks and other delimiters like quotes, parentheses etc are skipped.
The concordance starts with the letter A. Graphic words with a non-alphabetical initial (including enclitics like 'll and 's) are placed behind Z, at the end of the concordance. Forms marked with the abbreviation code (\0) are placed at the end of the entry for the relevant graphic word.
8.5 Frequencies
At the beginning of each graphic word there is a frequency survey giving the following information: (1) total frequency of each tag found with the word, (2) relative proportion of each tag (expressed in %), (3) relative frequency of each tag (expressed in words per million), and (4) absolute and relative frequencies of each tag in the individual text categories. See the example above.
8.6 Index to the KWIC concordance (tapes)
File |
First word in file |
1 |
Introduction |
2 |
a_&FW |
3 |
a'_ABN |
4 |
advance_JJB |
5 |
all-age_JJB |
6 |
an'_CC |
7 |
and/or_CC |
8 |
area_NN |
9 |
Asaph_NP |
10 |
attitudinising_NN |
11 |
B_ZZ |
12 |
Be*?2gue*?2_NP |
13 |
believing_VBG |
14 |
Borodin's_NP$ |
15 |
Butagas_NP |
16 |
Canaanites_NNPS |
17 |
character-actors_NNS |
18 |
coalition_NN |
19 |
conducting_NN |
20 |
Coulomb_NP |
21 |
d-ZZ |
22 |
departs_VBZ |
23 |
discusses_VBZ |
24 |
draped_JJ |
25 |
einen_&FW |
26 |
estate-bottled_JJ |
27 |
Expresso_NP |
28 |
Feltri_NP |
29 |
fluttering_NN |
30 |
forage_NN |
31 |
from-abroad_JJB |
32 |
g_ZZ |
33 |
Go"sta_NP |
34 |
Guinevere_NP |
35 |
hard-core_JJB |
36 |
he's_NN$ |
37 |
high-backed_JJ |
38 |
histamine_NN |
39 |
I_&FW |
40 |
Ifield_NP |
41 |
in*:2**:_NNU |
42 |
interviewed_VBD |
43 |
Isa_NP |
44 |
it's_NN$ |
45 |
killer_NN |
46 |
L_NC |
47 |
leggings_NNS |
48 |
living-room_NN |
49 |
Madrassi_NP |
50 |
Matthew_NP |
51 |
mticrobe_NN |
52 |
more-than-life-size_ JJB |
53 |
mycological_JJ |
54 |
news-hounds_NNS |
55 |
not-necessary_JJB |
56 |
O_&FW |
57 |
off_IN |
58 |
on-the-spot_JJB |
59 |
ora_&FW |
60 |
overestimate_VB |
61 |
pay-off_NN |
62 |
placid_JJ |
63 |
precipice_NN |
64 |
prosperous_JJ |
65 |
R-ZZ |
66 |
referee_NN |
67 |
resurgent_JJ |
68 |
rustled_VBD |
69 |
scientifically_RB |
70 |
set-back_NN |
71 |
shoulda_MD |
72 |
smallest_JJT |
73 |
souls_NNS |
74 |
steel-and-glass_JJB |
75 |
such-and-such-ABL |
76 |
t_PP3 |
77 |
Thangue_NP |
78 |
thatch_NN |
79 |
the*?2a*?5tre_&FW |
80 |
thereabouts_RB |
81 |
Thistle_NP |
82 |
tiara_NN |
83 |
TO"lz_NP |
84 |
triumphal_JJ |
85 |
U_ZZ |
86 |
US-initiated_JJ |
87 |
voting_NN |
88 |
Wasdale_NP |
89 |
werewolves_NNS |
90 |
whichever_WDT |
91 |
wing_NN |
92 |
women's_NNS$ |
93 |
years'_NNS$ |
94 |
#_&FO |
95 |
193,174-CD |