1 The LOB Corpus

1 The LOB Corpus

The Lancaster - Oslo/Bergen (LOB) Corpus is a million-word collection of present-day British English texts, compiled under the direction of Geoffrey Leech, University of Lancaster, and Stig Johansson, University of Oslo, in collaboration with Knut Hofland, Norwegian Computing Centre for the Humanities, Bergen. Like its American counterpart, the Brown Corpus (see Francis and Kucera 1979), it contains 500 text samples of approximately 2,000 words distributed over 15 text categories:

Text categories		Number of samples in each category
		Brown Corpus	LOB Corpus
A	Press: reportage	44	44
B	Press: editorial	27	27
C	Press: reviews	17	17
D	Religion	17	17
E	Skills, trades and hobbies	36	38
F	Popular lore	48	44
G	Belles lettres, biography, essays	75	77
H	Miscellaneous (government documents, foundation reports, industry reports, college catalogue, industry house organ)	30	30
J	Learned and scientific writings	80	80
K	General fiction	29	29
L	Mystery and detective fiction	24	24
M	Science fiction Science fiction	6	6
N	Adventure and western fiction	29	29
P	Romance and love story	29	29
R	Humour	9	9
Total		500	500

For more details, see the LOB Corpus Manual of Information (Johansson et al 1978). The present manual deals with the tagged versions of the corpus. For more information on sampling and sources of the texts, the user must turn to the original manual.