7.5 Auxiliaries
BE and HAVE are given special tags: BE, BEM, BER, BEZ, BED, BEDZ, BEN, BEG; HV, HVZ, HVD, HVN, HVG (see Appendix 4). No distinction is made between uses as auxiliaries and as lexical verbs. Forms of DO are given special tags (DO, DOZ, DOD), except done and doing which are always used as lexical verbs and are marked VBN and VBG, respectively. DARE and NEED are tagged MD when they lack -s in the third person singular and/or are used without DO-periphrasis in interrogative sentences and negative constructions. In other contexts DARE and NEED are treated as lexical verbs. HAVE TO receives the regular HV... tags. Note the tagging of. used (VBD) to, gonna VBG, wanna :VB. Occasional problems arise with: 'd (HVD) or MD), 's (BEZ or HVZ), and ain't (BER, BEZ, HVZ + XNOT).
7.6 Nouns: number and case
Distinguishing singular and plural forms of nouns is not always straightforward. Compare.
The italicized nouns in (1) and (4) are clearly singular and those in (2) and (5) clearly plural, as shown by features in the context (determiners, subject-verb concord, coordination with plural nouns). In (3) and (6) there are no grammatical clues to number, and we must rely on pragmatics or simply leave the tag selected by the automatic tagging programs. Examples like (7) are the most difficult ones to handle. Taken by itself deer in this context would normally be treated as NNS; but bear is clearly singular. Since names of animals in the singular can be used collectively (cf. Caribou, musk-ox, polar bear, snow-goose, walrus and seal... F36:205), we chose the tag NN.13
Problems may arise with words denoting different species of birds and fish, which often do not have a distinctive plural form. Examples:
Example (8) was treated like (7). In (9) the tag NNS was chosen because of the following plural possessive form. In (10)-(11) we were guided by subject-verb concord. NN was chosen in (13), since attributive nouns are generally singular. In (14) the reference is to fish as food and the tag NN was assigned, since typical count nouns have comparable singular uses. In the following example the tagging is the result of an attempt to imagine the situation depicted (and the tag NNS is no doubt also possible in the first two cases):
The word fish itself is often difficult to tag. Compare:
NN vs NNS can safely be used where there are clear grammatical clues in the context, as in (16)-(19). (20) and (21) show that coordination in not a sufficient criterion. NNS was assigned in the former case, as the reference is to concrete specimens, and NN in the latter, where the reference is to fish as food. Finally, the attributive position in (22) points towards NN.
Number is also uncertain in nouns ending in -ics. Where these refer to sciences or subjects of study, singular use is normal (there should be no need to illustrate this). Examples of the plural:
NNS has been applied where there are clear grammatical clues pointing towards the plural. With some -ics nouns which are frequently used in the plural (tactics, statistics), meaning has also been used as a guide in the tagging.
Special problems are also caused by the following two nouns: means, people. Examples:
Here we can go by determiner or verb agreement. Where there are no such clues, the tagging has been slanted towards NNS, since this is the most common tag in clear cases. Note, in particular, NNS in phrases like: by all/any/no means (IN IN" IN" is used with the phrase: by means of). Meaning has also been a guide in the case of people. Where the reference is to 'a tribe, a nation', the preferred tag has been NN. Note, however, that such examples are found both with singular and plural concord. Compare:
Where number is indeterminate, it would probably have been a good idea to assign a tag reflecting this or to allow a word to have more than one tag. We decided, however, to stick to our tag set and not to depart from our principle of assigning a single tag to each word.
While the tagging of singular vs plural causes problems with a good number of examples, case is seldom uncertain. The general rule has been to use the genitive tag ($) wherever there is an apostrophe in a modifying noun. Occasional difficulties arise in examples like:
Although there is a clear preference for the genitive in cases like (31)-(32) and (35)-(36) and an equally clear preference for the common case in examples like (33)-(34), the forms have simply been tagged NNS$ vs NNS according to the presence vs absence of the apostrophe. Similarly, the genitive form in (37) has been treated strictly in accordance with the spelling, and the tag NN$ has been assigned.
7.7 Proper nouns
As explained in 5.1, word-initial capitals have been removed in many descriptive names; the changes are recoverable from the 'special information' in the vertical version of the tagged corpus (see 2.6). Word-initial capitals only remain on words habitually written with a word-initial capital, and these words are subclassified using the tags NP, NPL, NPT, NNP, and JNP.
NP - 'true' proper names
This tag, which is far more restricted than in the tagged Brown Corpus, includes:
1. Personal names: first names and surnames (whether English or foreign); initials in names like F. D. Roosevelt; nicknames; names of supernatural beings (incl. God); etc. 14
Note that von, de, van, etc without word-initial capitals in foreign names are tagged &FW.
2. Names of animals, including horses' names such as: Team Spirit, St Paddy, Sagacity, Bold Liver, etc.
3. Place-names: continents (e.g. Africa); countries (e.g. Great Britain); cities (e.g. New York); oceans (e.g. the Pacific); deserts (e.g. the Sahara); etc.
In place-names combined with a descriptive locative noun such as Forest, Ocean, Sea, etc the latter words are tagged NPL (see below).
The tag NP is also used for ordinary lexical words which form part of conventionalised names and are spelled with an upper-case initial: the Cape of Good Hope, the United Kingdom, Clock Court, etc.
Articles and prepositions are given their usual non-NP tags when they are part of a place-name and are spelled with a lower-case initial, as in: the Cape of Good Hope. The same applies to function words with lower-case initials in other types of names.
4. Abbreviations in classes 1 and 3 above (e.g. the U.S.A.) and abbreviations naming organisations: NATO, T.U.C., UN, etc. But note that the general rule has been to convert upper-case initials to lower-case (and use non-NP tags) with descriptive names of organisations and cultural institutions: the north Atlantic treaty organisation, the Moorlands building society, general electric, the united nations, New York's metropolitan opera, the Indian national congress, etc.
5. Names of holidays: Christmas, Easter, etc. Note, however, that the words for the months and the days of the week are tagged NR (see 7.10).
6. Names of hotels, restaurants, and pubs: the Good Companions, the White Horse, the Elephant and Castle, etc.
7. Names of football teams: Dundee United, Queen's, Dunferline Athletic, etc. Where the name is morphologically plural, the tag assigned is NPS (see below).
8. Brand-names: Coca-cola, Jaguar, Sellotape, Xerox, etc. Even when a brand-name is used like as common noun (e.g. Would you like a Coca-cola?), it is tagged NP as long as it is capitalised. Note in this connection also the NP tag in: a double Scotch.
9. Personifications with word-initial capitals, as in: And Desire came up with a straight left and Anger staggered... (K07:46). Personifications behave like personal names, and so are tagged NP.
10. Miscellaneous: the Union Jack., Blue Streak, Congress, Parliament, the House of Commons, Allied troops. the Bible (and names of books in the Bible), etc.
NPS - plural proper names
Names which are both morphologically and syntactically plural are tagged NPS. Examples:
1. Personal names: the Wilsons, the Smiths, etc.
2. Place-names: the Alps, the Andes, the West Indies, etc.
3. Names of sports teams and music bands: Bristol Rovers, Spurs, Rangers, Wasps, the Coronets, etc.
4. Names of companies: Richardsons, Lloyds, Bents, etc.
5. Miscellaneous: Salems (cigarettes), Shells (shares), etc.
If the name is treated syntactically as singular, it is tagged NP rather than NPS: Chalfont Heights, the United States, 15 the New York Times, etc. Plurals of words of the types NPL, NPT, and NNP are tagged NPLS, NPTS, and NNPS, respectively.
NPL(S) - locative nouns with word-initial capital
Examples of locative nouns are: Bay, Bight, Cape, Firth, Hill, \0Is, Island, Isle, Lake, Loch, \0Mt, Mount, Mountain, Peninsula, Plain, Point, \0Rd, Road, \0St, Street, \Sq, Square, Valley, Wood. Locative nouns often combine with 'true' proper nouns to form place-names, as in:16
the |
Loch |
NPL |
|
Firth |
NPL |
Ness |
NP |
of |
|||
Forth |
NP |
The tag NPL also applies to various types of buildings or monuments, even when the name designates an institution rather than the building itself, e.g.: Abbey, College, Hotel, \0Ho, House, Inn, Monument, Palace, School, University, etc. Examples:
the |
Lancaster |
NP |
|
Houses |
NPLS |
University |
NPL |
of |
|||
Parliament |
NP |
NPT(S) - titular nouns with word-initial capital
Titular nouns usually co-occur with personal names (as in King Charles, President Kennedy), but they can also occur on their own (as in Can I help you, Sir?), with another NPT (as in \0Mr President), or as the head of a naming expression (as in the King of Sweden). The following classes of titular nouns spelled with an upper-case initial are tagged NPT:
1. Hereditary titles and titles of respect which do not designate a specific occupation: Duchess, Duke, Emperor, Empress, King, Lord, \0Mr, \0Mrs, Miss, Prince, Princess, Queen, Sir, etc.
2. Clerical titles: Bishop, Canon, Cardinal, \0Rev, Reverend, etc.
3. Titles denoting military rank: Admiral, Captain, Colonel, Private, etc.
4. Family relation terms like: Aunt, Brother, Dad, Father, Grandad, Grandma, Mother, Sister, Uncle. 17
5. The letters added after someone's name to indicate degrees, honours, qualifications, etc: \0B.A, \0M.Sc, \0Ph.D, \0C.B.E, etc. The same applies to \0Esq. 18
6. Titles denoting important official posts or occupations: President, Professor, \0Dr, Doctor, Justice, Vice-President, Chancellor, Minister, \0M.P, etc. This class is problematic. In general, multi-word titles (e.g. Warrant Officer) are not tagged NPT. Exceptions are:
Assistant |
BECOMES |
assistant |
NN |
Secretary |
Secretary |
NPT |
NNP(S) - common noun habitually written with a word-initial capital
Although they are spelled with an upper-case initial, NNPs act syntactically and morphologically as common nouns. Generally they can be preceded by an article and have singular and plural forms. Examples:
1. Nouns referring to races, inhabitants of countries, etc: a Londoner, some Mexicans, a Jew, an Englishman, the English (NNPS), Japanese (NNP or NNPS), etc.
2. Nouns referring to languages and dialects, as in: He speaks English, French, and Japanese.
3. Nouns referring to people belonging to a particular period, party, faction, etc, often ending in -ist, -ite, -ese, -an: a Gaullist a Thatcherite, an Etonian, the Elizabethans (NNPS), etc.
4. Nouns referring to factions, parties, etc, ending in -ism, as in: Gaullism.
As the examples show, NNPs are generally derived in some form from 'true' proper nouns. Sometimes we find derived forms like: a New Yorker, the South Africans. In these cases only the last word is tagged NNP:
a |
the |
||
New |
NP |
South |
NP |
Yorker |
NNP |
Africans |
NNPS |
JNP - adjective habitually written with a word-initial capital
Like NNPs, these are generally derived from 'true' proper names. Examples:19
1. Adjectives derived from personal names, as in: the Victorian era, Gaullist policies, etc.
2. Adjectives derived from place-names, as in: Asian culture, Japanese technology, etc.
3. Adjectives derived from tribal or racial names, as in: the Jewish religion, Arabic literature, etc.
As with NNPs, we may find 'compound' derived forms:
the |
the |
||
East |
NP |
South |
NP |
German |
JNP |
African |
JNP |
authorities |
government |
Many forms can be either JNP or NNP (and sometimes also NNPS), depending upon context, e.g.: a Chinese (JNP) vase, speak Chinese (NNP), many Chinese (NNPS). The same applies to: Dutch, English, etc. Sometimes there are problems of demarcation. Note the ambiguity of an example like: the French teacher. Here the tag could be either JNP ('the teacher from France') or NNP ('the person who teaches French').
NP vs JNP, NNP
The general rule has been to use the tags JNP and NNP(S) only with derived forms (which can to a large extent be identified by automatic procedures). Where an unaltered proper noun modifies another noun, the tag used is NP: the Heath administration, the Malawi government, the Pacific provinces, etc. A more questionable decision was to use the NP tag with tribal names like: Apache, Eskimo, Zulu. There is some inconsistency in the treatment of such words. Note the NP tag in: a Soviet officer, Soviet Russia, a distinguished Soviet actor, etc.
Hyphenated forms
Proper names are sometimes preceded by prefixes spelled with a lower-case initial. The tag assigned is then generally that appropriate to the second element: the mid-Wales (NP) branch, ex-Minister (NPT), the first all-Hebrew (JNP) secondary school, etc. The tag appropriate to the second element is also assigned in cases like: the Hebrew-reading (JJ) public, \0KANU-tribes (NNS), etc.
The reasoning here is that grammatical class in English is determined by the last element of a complex word.
NP vs non-NP
The borderline between NP and non-NP is not always clear, and this has no doubt resulted in some inconsistency in the tagged corpus. One problematic area is the change to lower-case initials in descriptive naming expressions (cf. 5.1) and the consequent use of non-NP tags. Where should the line be drawn between naming expressions which describe their referents and conventionalised names whose selection is more or less arbitrary? The solution chosen here was to differentiate between categories of names which are predominantly one or the other (cf above under NP). The problem is compounded by the tendency in English to capitalise certain types of common nouns. Compare:
North |
BECOMES |
north |
|
West |
west |
||
Norway |
Norway |
NP |
|
East |
REMAINS |
East |
NP |
Germany |
Germany |
NP |
Capitals are removed in the first example, which is not an established name.20
Another problem is the treatment of words indicating political and religious persuasions, movements in art and music, etc: Democrat, Republican, Nazi, Catholic, Methodist, Protestant. Cubism, Impressionist, etc. These words are similar in meaning and form to those tagged JNP but, in contrast to the latter, they are not derived from proper names. Since capitalisation is variable, the general rule has been to convert initial capitals to lower case and use non-NP tags (JJ or NN). Again, there is some inconsistency in the tagged corpus.
7.8 Adjectives
Adjectives present three major tagging problems (in addition to those already dealt with: -ed forms, see 7.3; -ing forms, see 7.4): JJ vs JJB, JJ vs NN (see 7.9), and JJ vs RB (see 7.11). The tag JJB is used for adjectives which are restricted to attributive position before a noun. A form is either JJ or JJB. No attempt has been made to distinguish between examples like: a true story (cf the story is true), a true scholar (cf *the scholar is true). Both are tagged JJ. Forms tagged JJB have no predicative uses. The principal cases are:
1. limiting adjectives, as in:
Other limiting adjectives are: lone, main, principal, selfsame. Limiting adjectives are similar in function to determiners. Note that a typical limiting adjective like chief can occasionally be found outside attributive position:
Since such uses are exceptional, they have not been allowed to affect our classification of chief as a JJB adjective.
2. intensifying adjectives, as in:
Other intensifying adjectives are: arch, entire, mere, sheer.
3. forms denoting location in place or time, as in:
Other JJB forms of this type are: bottom, centre, down, erstwhile, hind, indoor, inland, inner, inside, interim, inward, medium, mid, middle, midland, nearby, nether, off, onward, outdoor, outward, overseas, seaward, sideways, top, upper, westward, windward, yonder. Many of these forms can also be used in adverbial positions and are then given an adverb tag. Note that present (as in: the present king) is tagged JJ rather than JJB, since it can also be used as a predicative adjective (albeit in a different sense, as in: the king was present).
4. hyphenated attributive forms, as in:
This group, which is certainly the largest and most varied, was assigned the tag JJB through a routine in the automatic tagging programs. Common patterns are: numeral-noun (as in 13), prefix-noun (as in 16), adjective-noun (as in 18). JJB does not apply to hyphenated forms which also occur outside the attributive position.21 The automatic tagging program assigned the tag JJ rather than JJB to hyphenated -ed and -ing forms in examples like:
Denominal -ed combinations like those in (25) are often found in non-attributive position, and the JJ tag is therefore appropriate. Hyphenated deverbal -ed and -ing forms are no doubt restricted to attributive position in the great majority of cases. The tagging was, however, only changed to JJB for a limited number of combinations, mainly of the type adverb + -ed or -ing (as in 23 and 24), where occurrence in non-attributive position is extremely unlikely.
5. occasional hyphenated 'combining' forms, as in:
6. other attributive-only forms, as in:
Other forms in this group are: amateur (layout), crack (jockey), dire (need), everyday (life), express (command), favourite (brand), fellow (feeling), folk (music), fool (boy), foster (mother), freak (case), giant (oak), head (office), key (factor), lay (people), like (manner), maiden (speech), model (prisoner), moot (point), net (profits), period (costumes), prime (minister), pseudo (third way), quasi (particle), ritual (art), rival (theories), solo (baritone), teenage (idol), vice (king), virgin (forest)
Note that some of the words in the AP group (see 7.12) are closely related to limiting adjectives: cf sole (JJB), only (AP). In the last group (6) there is overlap with respect to NN (see the next section).
Finally, note that JJB does not include nominal adjectives like: atomic, chemical, industrial, medical, molecular, residential, urban. These cannot easily be listed or clearly defined in some other way by the automatic tagging programs. Moreover, although they are typically attributive, they can easily develop predicative uses. For these reasons, they received the general adjective tag JJ. In the revised tagging system, JJB will be used more extensively, not only for nominal adjectives but also for other adjectives which are exclusively or predominantly used in attributive position (but are usually not taken up in discussions of attributive-only forms in grammars), e.g.: actual, little, previous.
7.9 Adjective vs noun
There are two main problem areas: (1) the choice of JJ vs NN in attributive position, and (2) the tagging of typical adjectives in nominal positions.
Attributive position
In a noun phrase both adjectives and nouns can modify the head. Compare:
The modifier in the first example is tagged JJ, since it can appear in a, clear adjectival position: it is grammatical. In the second example the modifier is NN, since it normally appears in nominal positions: grammar is abstract, John hates grammar. Grammatical can be qualified by an adverb of degree: quite grammatical. Grammar can be accompanied by determiners and modifiers: the grammar of German, German grammar.
Unfortunately, the distinction between adjective and noun is not always as clear as in the examples given so far. The typical JJB form neither really satisfies the criteria for adjectives nor for nouns. The adjective tag is assigned by 'default', since attributive position is a fundamental characteristic of adjectives, while it is only one of the subsidiary positions of a noun. The tag JJB is applied (1) provided that the form is excluded from clear nominal positions, and (2) if a corresponding noun exists, provided that the attributive use is clearly predominant and/or clearly differentiated in meaning from the noun. See further the treatment of JJB in the preceding section.
If the forms tagged JJB neither satisfy our criteria for adjectives nor for nouns, there are many others which can clearly be both adjectives and nouns. Note the coordination with a clear noun and a clear adjective, respectively, in:
Compare:
In these examples the attributive forms (the standard treatment, material progress) were grouped with the clear adjective uses, and this is no doubt often correct. JJ is assigned if there is a possible paraphrase with the form in predicative position: 'the treatment which is standard', 'progress which is material'. Note the use of the JJ tag with the following groups of words:
1. colour words, including silver, orange, hazel, cream, rose, etc. Compare:
2. words relating to religious and political persuasions, artistic movements, and the like: capitalist, catholic, escapist, extremist, humanist, impressionist, protestant, etc. Examples:
Compare:
Words in -ist are the most numerous in this group. The tag JJ was applied, as the attributive use is very common (and sometimes the only attested position in the corpus) and as clear adjective characteristics show up in some uses of the words. Note the predicative uses in (20), (22), and (23) above and the accompanying degree adverbs in (21) and (22). Note also that, while words like conservative, liberal, republican, etc can clearly be used as adjectives and are tagged JJ in attributive position, labour is tagged NN: the labour leader, labour views, etc.
3. miscellaneous examples (JJ or JJB vs NN):
NN is assigned where there are paraphrases with clear nouns ('a boxer who is an amateur', 'a lover of animals', 'investment of capital', etc). However, as with -ing forms (cf 7.4, the end), no claim can be made for complete consistency in the distinction between adjective and noun in attributive position.
Nominal positions
Many typical adjectives can appear in nominal positions. JJ is kept in the following cases (cf the treatment of -ed and -ing forms in nominal positions; see 7.3 and 7.4, VBG vs JJ):
1. ellipsis, as in:
The same tagging is used in economic contexts in expressions like: ordinary (shares), nominal (shares). There is a gradual transition to examples where the nominal use is freer and a head noun cannot be supplied from the immediate context: the final (i.e. dividend), the interim (i.e. dividend). NN was used in the latter case. Note also NN with: the Grand National (spelled with lower-case initials in the tagged corpus), oral (i.e. exam), etc.22
2. in expressions referring to groups of persons, as in:
But NN is used where the form can refer to a single individual, in which case there are usually other features pointing towards NN: a blonde, the blonde, blondes. NN is also used with (the) public, which behaves like a collective noun and is found quite freely in nominal positions: an international public, the general public, etc. Cf further NNP(S) with nationality words; see 7.7.
3. in expressions referring to abstract concepts, as in:
This type includes: above the ordinary, out of the ordinary. 23
4. coordinate expressions, as in:
Note also: for all and sundry (JJ).
The reasoning for the JJ tag is that the nominal use is severely restricted. The elliptic use is contextually restricted; the adjectiveness is shown by the possibility of inserting a noun or the propword ONE. The next two cases are limited to combinations with the definite article; the adjectiveness is shown by the possibility of inserting an adverb in examples like: the moderately rich, the completely impossible. The fourth case is limited to coordinated antonyms used in much the same sense as the definite expressions of the second and third type.
Forms which can be used more freely in nominal position are tagged NN. Some have acquired both the syntactic and the inflectional features of nouns, e.g.: a blonde, the blonde, blondes. Others normally lack noun inflections but can appear in nominal positions without the restrictions mentioned above. Examples:
Cases like the following are problematic:
In (52) no good was tagged RB JJ due to the similarity to clear cases of this kind: no better, no different, etc. The phrase violent red in (53) looks by itself like a noun phrase but is parallel to clear adjectives in the same sentence and red was therefore tagged JJ (and violent was treated like deep in deep red; cf 7.11). On the other hand, all the occurrences of dear in (54) were tagged NN. The first two examples are quite clear. It did not seem justified to treat the other examples differently; note the comma between the name and the adjective in the last example.
Idioms
Special problems are caused by idioms consisting of a preposition plus a form which is normally an adjective. Although this position is typically nominal, the tag NN seems inappropriate (if there are no clear nominal uses of the word). The treatment varies with the type of combination. In many cases we have resorted to idiom tagging: at large, for certain, in general, etc (see 7.2). The same type of tagging has been used for the idioms: in the main, of a sudden. Other taggings are:
IN ... NN: |
at present, for the present, by right, to the right, in the extreme, in the minor, in the raw, in the open, on the alert, on the contrary, on the defensive, on the whole, to the quick, the opposite of, the small of my back, their massive scrum gained firm control in the tight (A23:209), in the rough and tumble of journalistic life (G32:13), etc |
IN ... RB: |
before (very) long, for (far too, nearly as, etc) long, etc |
The second type is similar to examples of preposition plus adverb like: before then, until then, up to now. The possibility of internal variation speaks against idiom tagging. The first type is more varied. In some cases the NN tagging is no doubt justifiable, since the form can be used in a variety of nominal positions in much the same sense (present, right, extreme, etc). In other cases the nominal form is clearly differentiated in meaning from the corresponding adjective (quick, raw, small, etc), which speaks in favour of a separation from the adjective. This type gradually merges with the third main category of JJ tagging taken up above (the plus adjective used in an abstract sense). Using the argument of 'contextual restrictedness', we could probably have assigned JJ more freely to typical adjectives in nominal position. As it is, it is limited to four types where the nominal use is productive and the relevant adjectives cannot easily be listed.
Comparative and superlative forms
JJR and JJT are assigned with forms ending in -er and -est. The tag is dependent upon the first element in examples like: the better-managed firms, the better-sounding name, narrower-range investments, one of the best-looking men, etc. Note that the following words do not receive JJR/JJT tags: former (AP), latter (AP), fewer (AP), less (AP), more (AP), last (AP), least (AP), most (AP), inner (JJB), upper (JJB), further (JJB).
Comparative and superlative forms in nominal positions are treated in principle like the base forms of adjectives. JJR and JJT, respectively, are retained where a head noun is recoverable from the context: the biggest of the six contracting parties, all classes except the lower, etc. Adjective tags are also retained in nominal positions where the reference is to a group of persons or where the form is used in an abstract sense: the best and the brightest, at its best, do my best, on the best of going, at his sharpest, the worst was over now, bringing out the worst in women, etc. 24 But note NN where the reference is to an individual, as in:
Note also idiom tagging (RB RB") with: at best, at worst.