1 The LOB Corpus
The Lancaster - Oslo/Bergen (LOB) Corpus is a million-word collection of present-day British English texts, compiled under the direction of Geoffrey Leech, University of Lancaster, and Stig Johansson, University of Oslo, in collaboration with Knut Hofland, Norwegian Computing Centre for the Humanities, Bergen. Like its American counterpart, the Brown Corpus (see Francis and Kucera 1979), it contains 500 text samples of approximately 2,000 words distributed over 15 text categories:
Text |
Number of samples in each category |
||
Brown |
LOB |
||
A |
Press: reportage |
44 |
44 |
B |
Press: editorial |
27 |
27 |
C |
Press: reviews |
17 |
17 |
D |
Religion |
17 |
17 |
E |
Skills, trades and hobbies |
36 |
38 |
F |
Popular lore |
48 |
44 |
G |
Belles lettres, biography, essays |
75 |
77 |
H |
Miscellaneous (government documents, foundation reports, industry reports, college catalogue, industry house organ) |
30 |
30 |
J |
Learned and scientific writings |
80 |
80 |
K |
General fiction |
29 |
29 |
L |
Mystery and detective fiction |
24 |
24 |
M |
Science fiction Science fiction |
6 |
6 |
N |
Adventure and western fiction |
29 |
29 |
P |
Romance and love story |
29 |
29 |
R |
Humour |
9 |
9 |
Total |
500 |
500 |
For more details, see the LOB Corpus Manual of Information (Johansson et al 1978). The present manual deals with the tagged versions of the corpus. For more information on sampling and sources of the texts, the user must turn to the original manual.