Introduction to syntactic annotation


The parsing scheme for the PPCME2, the PPCEME, and the PCEEC uses a limited tree representation in the form of labelled parentheses. All open parentheses have an associated label, either a phrase label (NP, ADJP, etc.) or a word label (N, ADJ, etc.), representing nodes in a tree. We use the terms 'word label' and 'POS (part-of-speech) label' interchangeably. A word label is associated with every word, but phrasal labels are not included in every case in which a fully labelled tree would require them. Intermediate projections in the sense of X' theory (N', ADJ', etc.) are not generally included in our representations. By comparison to trees in current syntactic theory, the trees in our corpora are therefore quite flat, and they are not required to be binary-branching.

The partial representation of phrase structure in our corpora is not intended to make a theoretical statement, but was adopted for practical reasons. Certain phrases are generally omitted in the annotation scheme because their boundaries are too difficult to define. The prime example is VP. The problematic character of VP is particularly obvious in early Middle English, where the order of the verb and its complements is in flux (at least on the surface). But even in Present-Day English, the attachment site of verbal adjuncts is systematically ambiguous between low attachment to VP and high attachment at the clause level. Other categories, such as DP, were omitted because the cost of including them outweighs their usefulness. Intermediate projections are omitted for both reasons. In no case should the lack of any particular phrase label be taken to imply that earlier forms of English failed to include the corresponding syntactic category. The trees in the corpora are simply underspecified.

The examples in this section of the manual are constructed in Modern English so as to be maximally accessible. The remainder of the manual contains examples from the corpora. The examples are mostly from late Middle English and Early Modern English; examples from early Middle English are included where they are necessary to make a linguistic point.

General principles

As just mentioned, the structures in our corpora generally include neither a VP nor intermediate projections like I'. As a result, IP immediately dominates all verbs (to be understood in a broad sense, including modals and auxiliaries) and sentence-level constituents. A typical parse structure is the following:
(IP-MAT (NP-SBJ (NPR Mary))
        (HVP has)
        (BEN been)
        (VAG meaning)
        (IP-INF (TO to)
                (VB go))
        (PP (P for)
            (NP (D a) (N week)))))

Dash tags (or extended tags)

Structural principles

Internal structure of phrases

The internal structure of all nonclausal phrases is fundamentally similar.

Internal structure of clauses

Clauses are labelled either CP or IP. CPs contain either a complementizer or a wh- position (or both). IPs contain neither. All IPs and CPs carry dash tags indicating their subtype, as follows.
IP-ABS absolute clause
IP-IMP imperative
IP-INF infinitive
IP-INF-ABS absolute infinitive
IP-INF-ADT adjunct infinitive
IP-INF-DEG degree infinitive
IP-INF-PRP purpose infinitive
IP-MAT declarative matrix IP
IP-PPL participial clause
IP-PPL-ABS absolute participial clause
IP-SMC small clause
IP-SUB subordinate IP

CP-ADV adverbial clause
CP-CAR clause-adjoined relative
CP-CLF IT cleft
CP-CMP comparative
CP-DEG degree complement
CP-EOP empty-operator CP
CP-EXL exclamative
CP-FRL free relative
CP-QUE question
CP-REL relative clause
CP-THT THAT clause
CP-TMC TOUGH movement complement

Ordinary IPs

All IPs except
subjectless imperatives and subjectless infinitives have a subject in our annotation scheme. If the subject is not overt, an empty subject is added. Clauses generally do not contain VP (but see Verb phrase). As a rule, daughters of IP are phrasal (but see Internal structure of phrases for exceptions).
(IP-MAT (CONJ But)			<-- sentential conjunction
	(INTJ alas)			<-- single-word interjection
	(, ,)
        (NP-SBJ (PRO we))
	(MD will)			<-- modal
	(NEG not)			<-- negation
	(Q all)				<-- floated quantifier
        (VB end)			<-- verb
        (RP up))			<-- particle
	(PP (P with)
            (NP-OB1 (PRO$ our) (N favorite)))
        (. .))

( (IP-MAT-SPE (' ')
              (INTJ Yes)		<-- single-word interjection
              (, ,)
              (' ')
              (IP-MAT-PRN (NP-SBJ (PRO he))
                          (VBD seyde))	<-- verb
              (, ,)
              (' ')
              (NP-SBJ (PRO I))
              (MD shall)			<-- modal
              (VB promyse)			<-- verb
              (NP-OB2 (PRO you))
              (IP-INF (TO to)			<-- auxiliary
                      (VB fullfylle)		<-- verb
                      (NP-OB1 (PRO$ youre) (N desyre)))
              (E_S .)
              (' '))
  (ID CMMALORY,667.4880)) 

( (IP-MAT-SPE (CONJ for)				<-- conjunction
              (NP-SBJ (PRO he)
                      (CP-REL (WNP-1 0)
                              (C that)
                              (IP-SUB (NP-SBJ *T*-1)
                                      (MD shall)	<-- modal
                                      (VB pulle)	<-- verb
                                      (NP-OB1 (PRO hit))
                                      (RP oute))))	<-- particle
              (MD shall)				<-- modal
              (DO do)					<-- verb
              (NP-OB1 (PRO hit))
              (PP (P with)
                  (NP (Q litill) (N myght)))
              (E_S .)
              (' '))
  (ID CMMALORY,46.1512)) 

( (IP-MAT-SPE (CONJ and)				<-- conjunction
              (ADVP (ADV ellis))
              (NP-SBJ (PRO I))
              (MD wolde)				<-- modal
              (HV have)					<-- auxiliary
              (BEN bene)				<-- verb
              (ADJP (ADJ lothe)
                    (PP (P as)
                        (NP (Q ony) (N knyght)
                            (CP-REL (WNP-1 0)
                                    (C that)
                                    (IP-SUB (NP-SBJ *T*-1)
                                            (VBP lyvith)))))
                    (IP-INF (FOR for) (TO to)		<-- auxiliary material
                            (VB sle)
                            (NP-OB1 (D a) (N lady))))
              (E_S .)
              (' '))
  (ID CMMALORY,51.1701)) 

Imperatives (IP-IMP)

Imperatives are labelled IP-IMP. Only overt subjects are included in the annotation.
( (IP-IMP-SPE (CONJ but)
              (VBI saye)
              (CP-THT (C 0)
                      (IP-SUB (NP-SBJ (PRO ye))
                              (BEP are)
                              (VAN diseased)))
              (E_S ,)) 
  (ID CMMALORY,4.83))

( (IP-IMP (CONJ for)
          (VBI witte)
          (NP-SBJ (PRO ye))		<-- overt subject
          (ADVP (ADV wele))
          (CP-THT (C +tat)
                  (IP-SUB (NP-SBJ (NPR god))
                          (MD may)
                          (VB se)
                          (NP-OB1 (CONJ ba+te) (N iuil) (CONJ and) (N gude))))
          (E_S ;))
  (ID CMBENRUL,12.418))

Non-wh CPs

THAT clauses (CP-THT), degree complements (CP-DEG), and certain adverbial clauses (CP-ADV) have the following basic structure:
(CP (C THAT/0)
    (IP ...))

The complementizer position is always included; when not filled by an overt complementizer, it contains 0 (zero).

(NODE (PP (P so)
          (CP-ADV (C that)
                  (IP-SUB (NP-MSR (NP (NUM thre) (NS dayes))
                                  (CONJP (CONJ and)
                                         (NP (NUM thre) (NS nyghtes))))
                          (NP-SBJ (PRO he))
                          (BED was)
                          (ADJP (ADJ specheles)))))
      (ID CMMALORY,6.172)) 

(NODE (PP (P til)
          (CP-ADV (C that)
                  (IP-SUB (NP-SBJ (PRO ye))
                          (VBP see)
                          (CP-THT (C 0)				<-- empty complementizer
                                  (IP-SUB (NP-SBJ (PRO ye))
                                          (VBP go)
                                          (PP (P unto)
                                              (NP (D the) (ADJR wers))))))))
      (ID CMMALORY,13.393))

(NODE (PP (P whan)
          (CP-ADV (C 0)
                  (IP-SUB (NP-SBJ (NP (D the) (N duke))
                                  (CONJP (CONJ and)
                                         (NP (PRO$ his) (N wyf))))
                          (BED were)
                          (VBN comyn)
                          (PP (P unto)
                              (NP (D the) (N kynge))))))
      (ID CMMALORY,2.11)) 

(NODE (IP-SUB (NP-SBJ (PRO we))
              (VBP departe)
              (PP (P from)
                  (ADVP (ADV hens)))
              (ADVP (ADV sodenly))
              (, ,)
              (CP-ADV (C that)
                      (IP-SUB (NP-SBJ (PRO we))
                              (MD maye)
                              (VB ryde)
                              (NP-MSR (Q all) (N nyghte))
                              (PP (P unto)
                                  (NP (PRO$ oure) (ADJ owne) (N castell))))))
      (ID CMMALORY,2.18)) 

(NODE (IP-SUB (NP-SBJ (PRO he))
              (VBD understood)
              (CP-THT (C that)
                      (IP-SUB (NP-SBJ (NPR syre) (NPR Ector))
                              (BED was)
                              (NEG not)
                              (NP-OB1 (PRO$ his) (N fader)))))
      (ID CMMALORY,9.271)) 

(NODE (ADVP (ADVR so) (ADV harde)
            (CP-DEG (C that)
                    (IP-SUB (NP-SBJ (N horse) (CONJ and) (N man))
                            (VBD felle)
                            (PP (P to)
                                (NP (D the) (N erthe))))))
      (ID CMMALORY,17.538)) 

(NODE (ADVP (ADVR so) (ADV merveillously)
            (CP-DEG (C that)
                    (IP-SUB (NP-OB1 (N doubte))
                            (NP-SBJ-1 (PRO it))
                            (BED was)
                            (IP-INF-1 (TO to)
                                      (VB here)
                                      (PP (P of)
                                          (NP (D that) (N bataille))))
                            (PP (P for)
                                (NP (D the) (ADJ grete) (N blood) (N shedynge))))))
      (ID CMMALORY,68.2325)) 

Wh- CPs

A number of clause types, listed below, contain both a wh- position and a complementizer position. This is to allow for the case in which both positions are filled. Empty wh- positions and empty complementizers are both indicated by 0 (zero). The wh- operator is coindexed to a trace of the same category. See
Wh- traces for details, particularly The position of traces.

Verb fronting to C

Subject-verb inversion in
V1 conditionals, questions, and exclamatives is not explicitly represented as verb movement to C in our annotation scheme. The inverted verb remains a daughter of IP. However, clauses with inversion differ structurally from ones without in not containing a C position.
(PP (P if)
    (CP-ADV (C 0)			<-- C, no inversion
            (IP-SUB (NP-SBJ (PRO I))
                    (HVD had)
                    (VBN known))))

(CP-ADV (IP-SUB (HVD had)		<-- inversion, no C
	        (NP-SBJ (PRO I))
                (VBN known)))

(IP-MAT (NP-SBJ (PRO))
        (DOP do)
        (NEG not)
        (VB know)
        (CP-QUE (WADVP-1 (WADV when))
                (C 0)				<-- C, no inversion
                (IP-SUB (ADVP-TMP *T*-1)
                        (NP-SBJ (PRO they))
                        (MD will)
                        (VB come))))

(CP-QUE (WADVP-1 (WADV when))			<-- inversion, no C
        (IP-SUB (ADVP-TMP *T*-1)
	        (MD will)
	        (NP-SBJ (PRO they))
                (VB come)))

Fronting to pre-complementizer position

Fronting to Spec(CP)

Fronted elements can occupy Spec(CP), the position immediately preceding the complementizer. Since the specifier position is not explicitly indicated in our annotation system for any phrasal category, these elements simply appear within the CP in pre-head position. Such fronted constituents are coindexed with an *ICH* trace or with a resumptive (-RSP) phrase. For analogous cases of fronting in subordinate clauses that are introduced by a subordinating conjunction, see
Fronting to Spec(PP).

Similar cases with an overt complementizer both before and after the fronted element are treated as CP recursion.

(NODE (IP-SUB (NP-SBJ (PRO hie))
              (VBP make+d)
              (CP-THT (NP-LFD (D +danne) (N man)  <-- left-dislocated NP in Spec(CP)
                              (CP-REL (WNP-1 0)
                                      (C +de)
                                      (IP-SUB (NP-OB2 *T*-1)
                                              (NP-SBJ (NPR godd))
                                              (NP-OB1 (PRO his))
                                              (VBD to-sant))))
                      (C +tat)
                      (IP-SUB (NP-SBJ-RSP (PRO he))
                              (VBP +turwune+d)
                              (PP (P on)
                                  (NP (PRO$ his) (N godnesse))))))
      (ID CMVICES1,149.1875)) 

( (IP-MAT-SPE (CONJ &)
              (NP-SBJ *con*)
              (NP-LFD (PRO$ ti) (N wil))
              (VBP iwur+de)
              (NP-OB1-RSP (PRO hit))
              (NP-VOC (ADJ deorwur+de) (NPR lauerd))
              (CP-ADV (CP-ADV (C +tt)
                              (IP-SUB (NP-SBJ (PRO ich))
                                      (PP (P +turh)
                                          (NP (PRO$ +ti) (N streng+de)))
                                      (MD mahe)
                                      (VB stonden)
                                      (PP (P wi+d)
                                          (NP (PRO him)))))
                      (, .)
                      (CONJP (CONJ &)
                             (CP-ADV (NP-LFD (PRO$ his) (ADJ muchele) (N ouergart))  <-- left-dislocated NP in Spec(CP)
                                     (C +tt)
                                     (IP-SUB (NP-SBJ (PRO ich))
                                             (NP-OB1-RSP (PRO hit))
                                             (MD mote)
                                             (VB afeallen)))))
              (E_S .))
  (ID CMMARGA,70.251)) 

Adjunction to CP

Material appearing before a wh- element must be adjoined to CP rather than occupying Spec(CP) (since the wh- element occupies that position), but our annotation does not explicitly express the distinction between the two types of positions. For the fronting of verbs to the pre-wh position, see
Verb fronting in free relative clauses.
(NODE (IP-SUB (NP-SBJ (D tes) (ADJ unseli))
              (NEG ne)
              (MD +turue)
              (NEG nawt)
              (VB seggen)
              (, .)
              (CP-QUE-SPE (NP-LFD (PRO$ +ti) (NPR lauerd)
                                  (CP-REL (WNP-1 0)
                                          (C +tt)
                                          (IP-SUB (IP-SUB (NP-SBJ (PRO tu))
                                                          (VBP leuest)
                                                          (PP (P on)
                                                              (NP *T*-1)))
                                                  (, .)
                                                  (CONJP (CONJ &)
                                                         (IP-SUB (NP-SBJ *T*-1)
                                                                 (MD schulde)
                                                                 (NP-OB1 (PRO$ +ti) (N scheld))
                                                                 (BE beon))))))
                          (, .)
                          (WADVP-2 (WADV hwer))
                          (IP-SUB (ADVP-LOC *T*-2)
                                  (BEP is)
                                  (NP-SBJ-RSP (PRO he))
                                  (ADVP-TMP (ADV nu+de)))))
      (ID CMJULIA,122.464)) 

( (IP-IMP (VBI loke)
          (ADVP-TMP (ADV +tenne))
          (PP (ADV+P her-bi))
          (, .)
          (CP-QUE (NP-LFD (CP-FRL (WNP-1 (WPRO+ADV hwa-se))
                                  (C 0)
                                  (IP-SUB (NP-SBJ *T*-1)
                                          (PP (P of)
                                              (NP (PRO$ hire) (N mei+dhad)))
                                          (, ;)
                                          (VBP lihte+d)
                                          (PP (P in-to)
                                              (NP (N wedlac))))))
                  (, ;)
                  (WPP-2 (P bi)
                         (WNP (WQP (WADV hu) (Q monie))
                              (NS degrez)))
                  (C 0)
                  (IP-SUB (PP *T*-2)
                          (NP-SBJ-RSP (PRO ha))
                          (VBP falle+d)
                          (ADVP-DIR (RP+WARD dunewardes))))
          (E_S .))
  (ID CMHALI,144.244)) 

( (IP-MAT (CONJ and)
          (NP-SBJ *con*)
          (VBD tolde)
          (NP-OB2 (PRO hym))
          (CP-QUE (PP-1 (P whyle)
                        (CP-ADV (C 0)
                                (IP-SUB (NP-SBJ (PRO he))
                                        (VBD tarryed)
                                        (ADVP-LOC (ADV there)))))
                  (WADVP-2 (WADV how))
                  (C 0)
                  (IP-SUB (ADVP *T*-2)
                          (PP *ICH*-1)
                          (NP-SBJ (NPR Nero))
                          (BED was)
                          (VAN (VAN destroyed) (CONJ and) (VAN slayne))
                          (PP (P with)
                              (NP (Q all) (PRO$ his) (N oste)))))
          (E_S .))
  (ID CMMALORY,57.1905)) 

( (IP-MAT (CONJ &)
          (NP-SBJ *con*)
          (VBD (VBD +girnde) (CONJ &) (VBD walde))
          (ADVP (ADV +georne))
          (CP-THT (PP-2 (P +gef)
                        (CP-ADV (C 0)
                                (IP-SUB (NP-SBJ (NPR$ godes) (N wil))
                                        (BED were))))
                  (, ;)
                  (C +tt)
                  (IP-SUB (PP *ICH*-2)
                          (NP-SBJ (PRO ha))
                          (MD moste)
                          (BE beon)
                          (NP-OB1 (ONE an)
                                  (PP (P of)
                                      (NP (D +te) (Q moni) (N+NS moder-bern)
                                          (CP-REL (WNP-1 0)
                                                  (C +tt)
                                                  (IP-SUB (NP-SBJ *T*-1)
                                                          (NP-OB1 (QP (ADVR swa) (Q muchel)))
                                                          (VBD drohten)
                                                          (PP (P for)
                                                              (NP (NPR drihtin)))))))))))
  (ID CMMARGA,56.22)) 

CP recursion

Instances of CP recursion are given the following schematic structure. CP recursion generally occurs with THAT complements, but is attested in indirect questions and other clause types as well.
(CP-THT (C that)
        (CP-THT (TOPIC-1 ...)
                (C that)
                (IP-SUB (TOPIC *ICH*-1)
                        ...)))

The higher complementizer must be overt. Otherwise, the token is treated as fronting to Spec(CP), without recursive structure. As in the simple fronting case, unless the constituent sandwiched between the complementizers is left-dislocated (-LFD) and associated with a resumptive (-RSP) element, it is coindexed with an *ICH* trace.

( (IP-MAT (CONJ for)
          (NP-SBJ (NPR sain) (NPR paul))
          (VBP sais)
          (CP-THT (C +tat)
                  (CP-THT (NP-LFD (PRO +tai)
                                  (CP-REL (WNP-1 0)
                                          (C +tat)
                                          (IP-SUB (NP-SBJ *T*-1)
                                                  (DOP dos)
                                                  (NP-OB1 (ADJ wicke) (NS dedis)))))
                          (, ,)
                          (C +tat)
                          (IP-SUB (NP-SBJ-RSP (PRO tay))
                                  (VBP giue)
                                  (NP-OB1 (PRO+N +tam-selffe))
                                  (PP (P til)
                                      (NP (D +te) (NPR deuil))))))
          (E_S ,))
  (ID CMBENRUL,21.735)) 

(NODE (IP-SUB (NP-SBJ-2 (EX tare))
              (BEP be)
              (NP-2 (Q lytil) (N entirual))
              (, ,)
              (CP-ADV (C +tat)
                      (CP-ADV (NP-LFD (D ta)
                                      (CP-REL (WNP-3 0)
                                              (C +tat)
                                              (IP-SUB (NP-SBJ *T*-3)
                                                      (MD sal)
                                                      (VB ga)
                                                      (PP (P til)
                                                          (NP (NS laburs))))))
                              (, ,)
                              (C +tat)
                              (IP-SUB (NP-SBJ-RSP (PRO tay))
                                      (MD may)
                                      (HV haue)
                                      (NP-OB1 (D +te) (N morning))
                                      (PP (P in)
                                          (NP (D +te) (N begining)
                                              (PP (P of)
                                                  (NP (D +te) (N lyth)))))
                                      (PP (P to)
                                          (NP (PRO$ +tair) (N labur)))))))
      (ID CMBENRUL,15.546)) 

( (IP-MAT (ADVP-TMP (ADV +Ta))
          (VBD be+tohte)
          (NP-SBJ (PRO he))
          (NP-OB2 (PRO him))
          (CP-THT (C +tet)
                  (CP-THT (PP-1 (P gif)
                                (LB |)
                                (CP-ADV (C 0)
                                        (IP-SUB (NP-SBJ (PRO he))
                                                (MD mihte)
                                                (BE ben)
                                                (ADJP (N+ADJ rotfest))
                                                (PP (P on)
                                                    (NP (NPR Engleland))))))
                          (C +tet)
                          (IP-SUB (PP *ICH*-1)
                                  (NP-SBJ (PRO he))
                                  (MD mihte)
                                  (HV habben)
                                  (NP-OB1 (Q eal) (PRO$ his) (N wille)))))
          (E_S .))
  (ID CMPETERB,49.224)) 

(NODE (IP-SUB (NP-SBJ (PRO +du))
              (ADVP (ADV wel))
              (BEP be)
              (VAN iwarned)
              (, ,)
              (CP-THT (PP-2 (P +gif)
                            (CP-ADV (C 0)
                                    (IP-SUB (NP-SBJ (NPR godd))
                                            (NP-OB2 (PRO +de))
                                            (VBP +gif+d)
                                            (NP-OB1 (D +tese) (ADJ swete) (NS teares)))))
                      (, ,)
                      (C +tat)
                      (IP-SUB (PP *ICH*-2)
                              (NP-SBJ (Q non) (N win)
                                      (PP (P in)
                                          (NP (D +dare) (N world))))
                              (NEG+BEP nis)
                              (ADJP (ADVR swa) (ADJ swete)))))
      (ID CMVICES1,149.1856)) 

(NODE (IP-INF (NP-SBJ (PRO us))
              (VB considere)
              (ALSO also)
              (CP-QUE (C if)
                      (CP-QUE (NP-LFD (D the) (N conseillung)
                                      (PP (P of)
                                          (NP (PRO hem)
                                              (CP-REL (WNP-1 0)
                                                      (C that)
                                                      (IP-SUB (NP-SBJ *T*-1)
                                                              (VBD conseilleden)
                                                              (NP-OB2 (PRO yow))
                                                              (IP-INF (TO to)
                                                                      (VB taken)
                                                                      (NP-OB1 (ADJ sodeyn) (N vengeaunce))))))))
                              (, ,)
                              (WQ wheither)
                              (C 0)
                              (IP-SUB (NP-SBJ-RSP (PRO it))
                                      (VBP accorde)
                                      (PP (P to)
                                          (NP (N resoun)))))))
      (ID CMCTMELI,228.C2.447))