7.15 Conjunction vs preposition

A good number of words can be either conjunctions or prepositions: after, before, for, etc. Problems arise particularly with as and than, which are dealt with separately at the end of the section.

CC vs IN

As well as was originally tagged CC CC" CC". But it is not necessarily an idiom. Compare:

  1. There was a whip to goad as well as (CC CC" CC") a carrot to entice. J55: 101
  2. You know as well as (QL RB CS) I do Rose didn't kill herself. L05:146
  3. ... later we might not see one which we like as well as (QL RB IN) this one. P16:122
  4. A second problem is that the idiom as well as appears in positions where coordinating conjunctions are excluded, as in:

  5. ... to the famous 'Evergreen' which, as well as being made into a film, ran for two years. C16:165
  6. Such occurrences were tagged as complex prepositions (IN IN" IN"); cf the related preposition besides. An alternative would have been to treat all occurrences of the idiom as well as as IN.

    CS vs IN

    Some problematic words are: as, but (see the end of 7.14), except, like, till, until, than. The general rule has been to tag a form as IN before a following complement consisting of a noun, pronoun, or noun phrase (there should be no need to illustrate this). IN can also precede numerals and -ing clauses,35 as in:

  7. ... the programme would not start until eleven. G49:48
  8. If the night is cold you may feel like giving your guests a hot punch. E19:179
  9. The tag IN was also used before certain adverbs of time and place (RN) which can clearly be preceded by prepositions: until now (cf up to now), except here (cf from here), etc. Prepositions can precede other prepositions in some clear cases: from under the clouds (F1 1: 153), came to nearer home (G22:7), etc. But with forms which are either CS or IN, we generally opted for CS in this position.

    CS introduces finite clauses (again there should be no need for illustration) and is also used before adjectives, adverbs, prepositional phrases, and non-finite verb constructions (except -ing forms). In these cases it is generally possible to paraphrase with a finite clause. Examples:

  10. Do they quarrel? - 'Of course,' said Brian. 'Like mad sometimes...' F14:73
  11. ... he would not be home until late. N21:12
  12. First, nobody can transfer power, except in a purely legal sense. G73:12
  13. ... there is little for the west to do except stand firm. B02:53
  14. Bacon guided the early design stages of Mayfly until relieved by Sueter... J40:116
  15. Idiom tagging was used in some cases where neither CS nor IN applies; see 7.13, the end.

    The tag CS was used somewhat more widely with as and than than with the other words of the same problem group. There are some inconsistencies of tagging, in particular before -ing forms, adverbs, and clauses introduced by conjunctions and WH-words.

    As

    CS was assigned when as introduces finite clauses and when it occurs before adjectives, adverbs, prepositional phrases, and non-finite verb constructions:

  16. There could be no sharp divison as one believed when one was young. K18:64
  17. The car will call for you as usual. P08:172
  18. The sting is, as often, in the tail. G72:146
  19. He looked at her, noting, as for the first time, the pansy blue of the eyes... P03:153
  20. The official dates of University terms as published in the calendar apply to all students. H29:123
  21. How could he have been so careless as to leave it here? N09:130
  22. Bleak are recorded as having hybridised naturally with chub and roach. F38:142
  23. The use of CS before -ing forms is questionable, since these can occur as complements of prepositions.

    IN was chosen in positions characteristic of prepositions, i.e. before nouns, pronouns, and noun phrases (provided that they are not part of clause-like structures):

  24. As characters, poets range from rhyming layabouts to saintly travellers... A19:215
  25. ... she knew he was as obstinate as herself. N17:75
  26. You can look as fit as a fiddle and yet be bloodless. F33:142
  27. IN also applies in examples like:

  28. It came into existence as early as 1948... F15:78
  29. ... as it intended to do as recently as even fifty years ago. G58:58
  30. ... the character of inflation over such a short period as say 1956-57. J44:167
  31. ... have a second strike capability as extensive as what the Soviets can deliver by striking first. G75:140
  32. ... debating such matters as who had the right to impose taxes... G53:90
  33. The principles outlined above lead to different taggings in examples like:36

  34. ... they regard apartheid as (CS) evil and indefensible. B01:215
  35. Most people probably regard tiredness as (IN) a purely physical thing. D06:70
  36. Neither CS nor IN seemed applicable in:

  37. A nice clean decent desertion, and she never so much as turned her eyes on any other bloke! L14:31
  38. Here we resorted to idiom tagging (RB RB" RB"); see also 7.13, example 26.

    As can further qualify a following adjective or adverb and is then tagged QL. This tag is unproblematic (for a minor problem, see 7.10, example 76); typical occurrences of QL are found in some of the examples above (20, 21, 22, 23, 25).

    Finally, as occurs in many idioms:

    CC CC" CC"

    as well as (see the beginning of 7.15)

    CS CS" (CS" CS")

    in as much as, inasmuch as, in so far as, insofar as, so as, such as (see 7.12, ABL), as if, as though

    IN IN" (IN")

    as against, as between, as for, as from, as of, as opposed to, as regards, as to, as versus, as well as (see the beginning of 7.15), such as (see 7.12, ABL)

    RB RB" (RB")

    as good as (see 7.13, example 25), as well (=also), as it were, as yet, so much as (see above, example 29)

    TO TO" TO"

    so as to

    Note that as far as, as long as, etc were not idiom tagged; see 7.14 under 'idioms'.

    Than

    The two major tags are CS and IN. They were distinguished according to the same principles as with as. Examples:

    CS

    Bertie was much more intelligent than most people supposed. K13:60
    Tonight, she was even more tired than usual. N17:100
    ... is now less suitable for intending schoolteachers than formerly. BII:75
    ... more by way of starting a conversation with him than from a desire to know. KIO:112
    A paper in such a position should do more than merely please its readers. B19:148
    This proved more difficult than anticipated. G30:78
    That, at least,, is better than reading the whole thing. F03:134

    IN

    It was obvious that he had come to say more than good-bye. K18:201
    ... the wind has dropped to no more than a gentle breeze. B20:167
    ... are considerably lower than last year. H27:72
    But there was more to it than that. L08:61
    ... more than nine hundred fell captive to the Spaniards. F25:132
    ... as the natural term of life for a house, rather than, say, eighty years... J47:16

    The sequence more than was idiom-tagged (RB RB") before prepositional phrases, adjectives, adverbs, and verb forms in examples like:37

  39. ... the book is more than worth it. Cl 1:220
  40. Life was in those years more than busy. G04:122
  41. I had more than half expected that... M03:96
  42. ... this was more than offset by a reduction in our tax liabilities. H27:70

Cf further 7.13, the end.

7.16 WH-words

Originally WH-words were tagged according to their syntactic function in the clause:

WDT

determiner (including pronoun; cf 7.12)

WP

pronoun (with subdivisions according to case)

WRB

adverb

No distinction was made between interrogative and relative uses. To make the classification somewhat more refined, we added an R to the tag for relative uses of the following words: which, that, who, whom, whose.

Note the tagging of the following words:

when

WRB (all uses, including the conjunction)

where

WRB (all uses)

whereas

CS

whereupon

CS

whether

CS

As regards the distinction between W-tags and CS, see further 7.14.

But and as, some uses of which were traditionally analysed as relative, are never given W-tags. See the treatment of these words in Sections 7.14 and 7.15 respectively.

7.17 Numerals

Cardinals are tagged CD, whether they are written as digits or are spelled with ordinary letters. CD includes dozen and zero. The tagging does not vary with syntactic position. There is a special tag for one: CD1. This applies to all uses of one, except when it is part of a sequence with idiom tagging: one another PPLS PPLS", no one PN PN", some one PN PN".

Inflected forms keep their CD tags, with the addition of the usual inflection markers: $ for genitive and S for plural. Examples: one's CD1$, millions CDS, 1930's CDS.

A CD form may include punctuation marks: 2.1, 1,000, 213, etc. But note the special tag for a hyphenated pair of cardinals: 1959-60 CD-CD, 2-1 (football score) CD-CD, pp 3-7 CD-CD, etc. The hyphenated tag is only used where the hyphen is equivalent to a preposition. Thus the regular CD tag is assigned in an example like: from opening at eight till closing at one-thirty (K07:209).

CD also applies to sequences like: 5, +7, -7. But + and - when occurring on their own are tagged IN; see 7.22. Formulas and more complex mathematical expressions are tagged &FO. The same applies to forms with subscripts and superscripts. See 7.22.

Combinations of numbers and ordinary words are frequently restricted to attributive position and are then tagged JJB (cf 7.8): 14-year-old (adj),38 18-bedroom, post-1918, pre-1960, etc. Examples of other types of tagging are: end-1960 CD, niobium-95 NN, 14-year-old (noun) NN.

Numerals frequently combine with units of measurement. Sequences of $ or *+ (£) plus numeral (without an intervening space) are tagged NNU; see 7.19.

Simple combinations of letters and digits (except ordinals; see below) are generally assigned the tag appropriate to the initial character: A20 ZZ, 10a CD, etc.

Ordinals are OD, whether they are spelled with ordinary letters or written as digits plus st, nd, rd, or th. Provision was made in the tagging scheme for ordinal plus genitive, but no examples of the following kind occurred in the text: George the Third's reign, the third's hat was brown. CD1$ was assigned in: in Charles I's day (F42:60).

The tag OD is assigned when the form modifies a noun or is used in a nominal position and is immediately preceded by the definite article (in which case a head noun is usually recoverable from the context). Examples:

  1. I struck the first blow. N23:91
  2. She was his second alibi, of course. The first was the television set... L01:59
  3. Among the first to react was the headmaster, Temple, himself. F28:160
  4. The tag NN is assigned elsewhere in nominal positions, as in:

  5. ... now accounts for nearly a third of all local government building. B26:64
  6. ... the distance between the top tenth and the bottom tenth of the data... J19:145
  7. Plural forms are NNS (except in hyphenated fractions; see below): two thirds, nine tenths, etc.

    When ordinals occur in adverbial positions, they are tagged RB. Examples:

  8. He was against the man who first took up aims. G1 3:102
  9. I think this attitude is short-sighted. First, nobody can transfer power, except in a purely legal sense. G73:11
  10. Robert was left to play third whether he liked it or not. P14:34

OD was kept in cases like: second best, third best. Note the idiom tagging of. at first RB RB". But from the first is tagged IN ATI OD.

Fractions are given CD tags, whether they are written with digits (as regards the representation of fractions in the original Lob Corpus, see Johansson et al 1978:32) or as hyphenated words: 1/2 CD, one-half CD, one-and-a-half CD, one-quarter CD, two fifths CDS, etc. Note the distinction between examples like: one-tenth CD, one tenth CD1 NN. For the distinction between ABN and NN with half, see 7.12.

7.18 Interjections

Most of the words tagged as interjections (UH) can be readily identified as such through their phonological/orthographic features and/or their isolation from the surrounding syntactic constructions (often marked by punctuation). Typical examples are response words (ah, aye, h'm, m, mm, no, oh, OK, okay, sure, ummm, yeah, yes), exclamations (ach, alas, boo, boy, bravo, cor, damn, gee whiz, gosh, ha, hell, hurrah, oh dear, wow), and greetings (good-bye, good-morning, goodnight, hallo, hello, hullo, hi, welcome). There are also hesitation signals (er, well), appeals to the addressee (eh, please), directives (hush, lo), and onomatopoeic expressions (um-chink).

Phrases are not tagged UH; thus, for example, good morning is JJ NN and thank you VB PP2. But there may be sequences of interjections (often separated by punctuation marks), e.g.: oh dear, oh hell, oh no, oh please, oh sure, oh well, oh yes.

Individual words which can easily be parts of larger constructions (with the same meaning) are not tagged UH (that's) good, (I'm) sorry, (many) thanks, (my) God.

Problems sometimes arise with words which can either be interjections or belong to some other class. The interjections please and well are, for example, not always marked by features of punctuation:

  1. 'Please come in,' he said. K08:161
    'Do please tell Pepita. ...' P08:110
  2. 'Well 10 is twice 5 and 40 is twice 20.' J24:191
    'Well Jim I believe you, I don't like it.' L15:75
  3. Please and well can nevertheless easily be identified as interjections through their position in the sentence.

    Some words for greetings can be nouns as well as interjections. Good-bye was tagged as a noun in examples like: a sad goodbye, it's good-bye, her goodbyes, bid/kiss/say/wave good-bye. The tagging of words for greetings sometimes varies with punctuation:

  4. ... when Babba had shooed her upstairs, he'd said goodnight (NN). L06:167
  5. Adrian excused himself, said 'goodnight' (UH) with a meaningful glance... P08:13
  6. Adieu and farewell only occur as nouns in the Lob Corpus. There is a problem with welcome, which can be JJ (you are welcome), NN (a royal welcome), and VB (we welcome you) as well as UH. Note the following examples:

  7. Welcome to London. K05:86
  8. 'Ettore! Welcome! Welcome! Welcome home!' N25:159
  9. Although welcome is here followed by an adverbial of direction, it was tagged UH

    Boy is normally a noun and sure an adjective, but they can also be interjections:

  10. What else can a man do? - Boy (UH), are you kidding! K07:32
  11. 'Will you manage, boy (NN)?' N24:180
  12. 'Sure (UH),' Frank agreed absently. N09:94
  13. 'You sure (JJ)?' - 'Sure (UH) I'm sure (JJ)!' L17:102
  14. Boy in (7) is an interjection; the utterance is not addressed to a boy. In (8), however, boy is a form of address. Sure in (9) is equivalent to yes. Note the contrast between the two uses of the word in (10).

    OK/O.K./okay can either be JJ or UH (there are no instances in the text of the verb). Compare:

  15. 'Everything OK (JJ)?' N16:159
  16. At first it was all okay (JJ). Ui:50
  17. Asked what they were doing they said they were looking for a motor-cycle, but when further questioned, Chapman said: 'O.K. (UH) They've found us out.' A35:214
  18. '... If i am free, though, I'll give you a ring, O.K. (UH)?' - 'O.K. (UH), she replied readily. P03:73
  19. In (11) and (12) the forms are integrated syntactically into a larger construction, and the correct tag is therefore JJ. (13) is a clear example of the interjection; there is syntactic and orthographic separation, and the form cannot be part of a larger construction with the same meaning (cf above). (14) is less clear, as we could conceivably expand the two examples as 'Is that OK' and 'That is OK', respectively. Nevertheless, the tag chosen was UH; this is a reflection of our normalcy principle (cf Section 6).

    Swear-words are only tagged UH when the are not integrated syntactically with the surrounding constructions. Compare:

  20. 'Oh, hell (UH)!' was all Gus said at first. P09:33
  21. 'Wait a minute,' Ben said. She'll get down when you empty that carbine.' The big man looked hard at Roan. 'Empty hell (UH),' he said. N14:187
  22. 'What the hell (NN) are you doing?' A35:168
  23. The hell (NN) with Adriana Pavone! N22:120
  24. ... what is your married life going to be like? Hell (NN).' Doc answered for her. 'Just hell (NN) ...' P02:144
  25. Bearing in mind our definition of interjections, these examples are quite clear. But note that punctuation gives misleading clues in (16) and (19); these are examples where the automatic tagging programmes invariably fail. Another swear-word which can have more than one tag is damn. Examples:

     

  26. ... ask what had happened to their damn (JJ) bus. M04:47
  27. i didn't give a damn (NN). N22:122
  28. Well, damn (VB) it, i hardly knew you. L21:78
  29. 'Damn (UH),' i said. NII:114
  30. 'Oh, damn (UH), what's that now?' N24:163

The following related words were never tagged UH: damned (JJ, RB, VBN), darn (JJ, RB), darned (JJ, RB), darndest (JJT), durned (RB, VBN).39

7.19 Abbreviations

An abbreviation is marked in the original Lob Corpus by the prefix \0 (as in \0Mr), unless it is part of a sequence of abbreviations or an abbreviated expression, in which case the whole sequence is bracketed: {0B.A.}, {0U.S.}, etc. See Johansson et al (1978:30f). In the tagged text all abbreviations are preceded by \0. There is also a marker under 'special information' in the vertical version of the text; see 2.6. Another change is that abbreviation points are deleted at the end of words (so \0Mr. becomes \0Mr). In the examples below we will omit abbreviation points and the abbreviation marker \0.

The most common tags used with abbreviations are NN, NP... and NNU(S); see below. In other words, abbreviations usually function as nouns. Some examples of the various types of tags are:

CD

m (=million)

IN

nr, v, vs

JJ

jun, mod, sep

JNP

Inc, Ltd

NN

ch, Cmnd, Cons, fig, IQ, Lib, LP, MS, no, para, pl, PT (=Physical Training), TV, VIP, vol, WC

NNS

figs, mod cons

NNU

cm, ft, in, mph, sec, yd, DM (currency)

NNUS

galls, gns, hrs, ins, lbs, mins, pts, yds

NNP

Co (=Company), Lat (=Latin)

NP

CP Snow, USA, YMCA, TUC, London SW], Ps (=Psalm)

NPS

Pss (=Psalms)

NPL

Co (=County), Is, Rd, Sq, St (=Street)

NPT

Mr, Mrs, Rev, Rt Hon, Sec, Sgt, MP, BA, PhD, CBE, St (=Saint)

NPTS

MPs, C1rs

NR

Nov, 20 *@ W (*@=degree symbol, W=West)

RB

AD, BC, am, pm, pa (=per annum), eg, ie. viz, clo, approx, c (=circa), etc, &c, f, ff

UH

OK

VB

v below, cf

The same form may correspond to more than one tag. Examples:

BC

NP
RB

Victoria, BC
600 BC

C, c

NNU
NP
NPT
RB

a 100 c Panax scaler
C P Snow, a temperature below 30 *@ C
D C Hammond (=Detective Constable)
c 1700

M, m

CD
JJ
NN
NNU
NP
NPT

about 40 m composition bricks
case 118 m (=male)
qualified in D and M (=Driving and Maintenance)
2 m high
T M Banks
M Larsonnier

Rev

NP
NPT

(=Revelations)
(=Reverend)

St

NN
NNU
NPL
NPT

(=stitch)
weighing
16 st
Baker St
St John

V, v

IN
NN
NNU
NP
VB

centigrade v fahrenheit
the slection of v 2 (=verse)
at 110 v
Mr E V Small
v below

An abbreviation may be part of a hyphenated form, as in: KANU-tribes NNS (G68:81), a 10-1b (JJB) bird (E20:65), ll-in (NNU) or 12-in (NNU) is about the maximum permissible (J75:102).40 A tag is assigned to the whole sequence, according to the same principles as with hyphenated forms in general (cf 7.2).

NP-tags

NP is assigned to initials in people's names and to abbreviations naming countries or organisations. Note also abbreviations of books in the Bible. NPL, is used with abbreviated locative nouns with a word-initial capital. NPT is found with abbreviated titular nouns with a word-initial capital, including the letters added after a person's name to indicate degrees, honours, qualifications, etc. For more examples, see 7.7.

NNU(S)

The tag NNU is used with abbreviated noun-like measurement units unmarked for number (singular/plural). NNU applies to many abbreviations; see the examples above. But NNUS is used when the form is marked for plural; see the examples above.

Apart from abbreviations, NNU applies to: $, *+ (£), %, *@ (degree symbol), /- (see 7.22 and 7.24). Note also the tagging of expressions for sums of money: $10 NNU, *+10 NNU. Idiom tagging is used with: per cent NNU NNU".

In the original Lob Corpus there are many combinations of a numeral and an NNU form (without an intervening space): 18ft, 14pts, 5%, etc. A space was inserted after the numeral in the tagged text and the tagging is therefore: 18 (CD) ft (NNU), 14 (CD) pts (NNUS), 5 (CD) % (NNU), etc. An NNU-form. may be hyphenated; see above.

Non-standard forms

In the original Lob Corpus there were special codes for non-standard forms; see Johansson et al (1978:29f). These codes have been removed from the words in the tagged text, but a marker is inserted under 'special information' in the vertical version (see 2.2 and 2.6). In the horizontal version and the concordance there are no indications of non-standard forms. This may be confusing, particularly with nonce-forms like vicilisation (=civilisation) and bunkrapt (=bankrupt); most of these are found in a single text (R07).

Non-standard forms are tagged in the same way as the corresponding -standard forms. Examples: yer (=your) PP$, ze (=the) ATI, t'ing (=thing) NN.

7.21 Foreign words and expressions

In the original Lob Corpus there were special codes for foreign-language material; see Johansson et al (1978:28f). These codes have been removed from the words in the tagged text, but markers are entered under 'special information' in the vertical version (see 2.2 and 2.6).

Words with codes for 'foreign word or expression' (\), 'Cyrillic alphabet' (\ 11), and 'Greek alphabet' (\15) in the original Lob Corpus are tagged &FW. The tagging of 'foreign word or expression widely used' (\6) varies. Single words are generally tagged in the same way as ordinary English words, e.g.: post-mortem NN or JJB,41 adagio (JJ) tempo, pace IN, qua IN, sic RB. Phrases are either &FW (word by word: in extenso, mirabile dictu, sine qua non, etc) or idiom-tagged;42 for some examples of idiom-tagging, see 7.2. Note that the tagging may vary depending upon syntactic position. Examples:

ad lib

JJ JJ"
RB RB"

changing from ad lib to restricted feeding
to feed young pigs ad lib
(both in E37:41-6)

a priori

JJ JJ"
RB RB"

a priori reasoning (J44:13)
determine a priori (G58:26)

Foreign abbreviations are dealt with in the same way as English abbreviations. For some examples, see 7.19.

7.22 Formulas and scientific symbols

A 'formula' in the tagged Lob Corpus is any word containing non-alphabetical characters which cannot be tagged as anything more specific. The tag &FO applies to:

See also the treatment of numerals in 7.17. As regards /, see 7.24.

7.23 Cited forms

The tag NC applies to quoted single words or short phrases embedded in sentences, as in:

  1. We find out colloquial term 'twister' near the mark. D03:106
  2. 'Now, when I hear a German say 'the party' I always think of the Nais,' Sonia laughed... K10:148

There is also a marker under 'special information' in the vertical version of the text; see 2.2 and 2.6). NC is not used to tag direct speech in the dialogue passages. Nor is it used with titles of books and works of art; these receive the regular word tags, but a marker is inserted under 'special information' in the vertical version of the text (see 2.2 and 2.6).

7.24 Punctuation marks

The following characters are included in a word if they immediately precede an alphanumeric character:

- . , : ; / (as regards /, see further below)

Examples: 2-1 .5 2.1 1,000 E:2 2/3. In other contexts punctuation marks are treated as separate 'words' and are given their own tags. The tags are usually identical to the marks themselves as represented in the Lob Corpus (see Appendix 4). An apostrophe is never treated as a separate word. Note the following points.

Full stop

Full stops were inserted in the text of the tagged Lob Corpus under certain circumstances (see 5.2). They were deleted at the end of abbreviations. A full stop is only treated as a 'word' when marking the end of sentences (tagged .) and occasionally when used in a mathematical expression (tagged IN; cf 7.22). Ellipsis is marked by three full stops

Colon

A colon may occasionally be IN: ... the survival ratio was high, at around 1: 100 (E07:77).

Slash (/)

Like some other punctuation marks, / is multi-functional. In the original Lob Corpus a space was inserted after /, except in the case of and/or and fractions. The last two types are still treated as single words in the tagged text (cf 7.14 and 7.17, respectively). In all other cases / was separated out and treated as a word by itself. The usual tags are either IN (=per) or CC. Examples: 90,000 gall / hour (E27:95), an explosive mixture of methane / air (J72:29). NNU applies in cases like (=shilling, shillings): 1 /- per share (H28:47), 3 /- per share (H27:86).

Hyphen

Sequences containing a hyphen are treated as one word. For examples of differences in tagging depending upon hyphenation, see 7.1. As regards the treatment of particular types of hyphenated forms, see 7.7 (proper nouns), 7.8 (attributive forms), 7.17 (numerals), and 7.19 (abbreviations). There is a clear distinction between hyphen (-) and dash (*-) in the Lob Corpus.

When - is surrounded by spaces, it is treated as a separate word. The tag IN applies when it is equivalent to minus (cf 7.22) and also in examples like (=to): from 4 1/2 - 7 years (J23:133).

Apostrophe

An apostrophe may open or close a word: a' (=all), 'em (=them), no' (=not), 'n (=and, in), an' (=and), o' (=of), wi' (=with), etc. Many such forms are written as separate words and are then tagged in the same way as the full forms. Contractions forming regular patterns are split up in the tagged Lob Corpus and each part is treated as a separate word. Examples:

I'll
she's
John's
let's
d'you



BECOME

I 'll
she 's
John 's
let 's
d' you

haven't
ain't (aint)
shan't
won't
y'know



BECOME

have n't
ai n't (aint)
shall n't
will n't
y 'know

There is no splitting up when the apostrophe marks the genitive inflection: John's, workers', somebody's, other's, others', one's, else's, etc. Nor are words split up in the less common cases where the apostrophe + s mark the plural: MP's, the 1930's, etc. Note further: a'right RB, o'clock RB, rock'n'roll NN, d'Alba NP, O'Hara NP, etc. The following colloquial forms (not marked by an apostrophe) are tagged as verbs: gonna VBG, wanna VB (=going to, want to).

There is a clear contrast between apostrophe (') and quotation marks (*' or **') in the tagged Lob Corpus.

7.25 Letters

Even the seemingly simple matter of tagging letters is not always straightforward. In the first place, we must cope with different alphabets (p, pi, etc). Secondly, a single letter may have a variety of uses. As an example, consider the letter a (excluding abbreviations, which were given special codes in the original Lob Corpus; cf 7.19):

&FW

e.g.

vis a vis

AP

 

a few, a little (AP AP")

AT

e.g.

a baby

IN

e.g.

a la (IN IN") Dietrich

JJ

e.g.

a priori (JJ JJ") reasoning

RB

 

a little (RB RB")

ZZ

e.g.

appendix A

The tag ZZ is only used when it is not possible to assign a more specific tag. Other examples of letters assigned different tags are (again excluding abbreviations): I PP1A or ZZ, m UH or ZZ, xIN (cf 7.22) or ZZ.

When a letter has noun inflection, it receives an NN-tag, as in: substituting for the D's (J76:8), B can partake in A's light-signalling experiment (J51:53).

As regards the tagging of combinations of letters and numerals, see 7.17. For the tagging of letter sequences in a scientific context, see 7.22.