The LOB tag set, though larger and more delicate than that used for the Brown Corpus, represents a fairly broad classification. Greater delicacy had to be sacrificed in order to achieve the goal of tagging the entire LOB Corpus.
3.1 An overview of the LOB tag set
Base tags |
Description |
Reference |
A... |
determiner/pronoun |
|
BE... |
be (lexical verb or aux) |
|
CC |
coordinating conjunction |
|
CD... |
cardinal numeral |
|
CS |
subordinating conjunction |
|
DO... |
do (lexical verb or aux) |
|
DT... |
determiner/pronoun |
|
EX |
existential there |
|
HV... |
have (lexical verb or aux) |
|
IN |
preposition |
|
J... |
adjective |
|
MD |
modal auxiliary |
|
N... |
noun |
|
OD... |
ordinal numeral |
|
P... |
pronoun |
|
QL... |
qualifier |
|
R... |
adverb |
|
TO |
infinitival to |
|
UH |
interjection |
|
VB... |
lexical verb |
|
W... |
WH-word |
|
XNOT |
not |
|
ZZ |
letter |
Suffixes |
Description |
May occur with |
A |
nominative |
determiners pronouns nouns numerals
|
0 |
accusative |
|
1 |
singular or plural |
|
S |
plural |
|
$ |
possessive |
|
R |
relative |
|
D |
past tense |
verbs |
G |
present participle, gerund |
|
N |
past participle |
|
Z |
3rd person singular |
|
R |
comparative |
adjectives |
T |
superlative |
adverb |
See further the list of tags in Appendix 4.
3.2 Differences with respect to the Brown tag set
The principal differences between the Brown and LOB tag sets are:
In addition, contractions are split up in the LOB Corpus (see 5.3 and the end of 2.4), while the Brown Corpus uses compound tags. See Francis and Kucera (1979). These differences in the tag set and varying practice in some cases in applying the tags (especially a somewhat wider use of non-participle tags for -ed and -ing forms; cf 7.3 and 7.4) mean that comparisons between the tagged LOB Corpus and the tagged Brown Corpus must be made with caution.
3.3 Ditto tags
as to |
IN IN" |
(complex preposition) |
each other |
PPLS PPLS" |
(complex reflexive pronoun) |
so as to |
TO TO" TO" |
(complex infinitive marker) |
Such idiom tagging makes it possible to avoid an arbitrary or counter-intuitive tagging at the level of the single word. See further 7.2.
The ditto tags occurring in the tagged corpus are included in the list in Appendix 4. The total number of words assigned such tags is fairly low (approx 5,500).