Contents of this chapter:
CorpusSearch 2 has a corpus-revision feature, which allows the user to make automatic changes to a corpus. This is useful, for instance, in correcting parser errors, or revising a corpus to fit new annotation guidelines.
Revisions are linked to a standard CS query, which is decorated with curly-bracket tags indicating where revisions should take place. The curly brackets contain an index which correlates an argument in the query to a revision instruction. I'll call the curly-bracket construction a "flag". This is the general idea:
query: ({x}A function B) AND (C function {y}D) revise{x}: info revise{y}: info
Also see the examples.
Suppose you have a query where the same node is mentioned several times. You may be tempted to flag the node every time it appears in the query, as below:
WRONG! query: (NP* iDoms {1}[1]Q) AND (NP* iDoms {2}[2]Q) AND ({1}[1]Q iPrecedes {2}[2]Q) add_internal_node{1, 2}: QP
The problem with this is that CorpusSearch only needs to have the arguments flagged once, and repeating the flags just increases the possibility of error (for instance, the same flag might wind up referring to two different nodes). For this reason, CorpusSearch ignores repeated flags, and issues a warning when they are encountered. The above query produces these WARNING messages:
WARNING! Subsequent flag {1} has been ignored. WARNING! Subsequent flag {2} has been ignored.
This version of the query is preferred:
query: (NP* iDoms {1}[1]Q) AND (NP* iDoms {2}[2]Q) AND ([1]Q iPrecedes [2]Q) add_internal_node{1, 2}: QP
The simplest way to change a tree is to change labels, leaving the structure intact. CS has the following label-changing revision functions:
node: IP* query: ({1}NP-ACC iDoms N*) replace_label{1}: BULLWINKLE
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (NP-ACC (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (BULLWINKLE (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
node: $ROOT query: ({1}WPRO iDoms what|What) AND (WPRO iPrecedes IP*) append_label{1}: -THAT
( (IP-MAT (CONJ but) (CP-QUE (WNP-1 (WPRO what)) (IP-SUB (NP-TMP *T*-1) (NP-SBJ (PRO I)) (MD shall) (VB returne) (NP-DIR (N home)))) (NP-SBJ (PRO I)) (BEP am) (ADJP (NP-MSR (D a) (Q little)) (ADJ doubtfull)) (. .)) (ID KNYVETT-1630,94.268))
/~* but what I shall returne home I am a little doubtfull. (KNYVETT-1630,94.268) *~/ /* 1 IP-MAT: 6 WPRO, 7 what, 8 IP-SUB */ ( (IP-MAT (CONJ but) (CP-QUE (WNP-1 (WPRO-THAT what)) (IP-SUB (NP-TMP *T*-1) (NP-SBJ (PRO I)) (MD shall) (VB returne) (NP-DIR (N home)))) (NP-SBJ (PRO I)) (BEP am) (ADJP (NP-MSR (D a) (Q little)) (ADJ doubtfull)) (. .)) (ID KNYVETT-1630,94.268))
node: $ROOT ignore_nodes: null query: ({1}[1], iDoms [2],) AND ([1], iPres *-PRN) AND (*-PRN iPres [3],) AND ({2}[3], iDoms [4],) prepend_label{1}: PRN- prepend_label{2}: PRN-
( (IP-MAT (CONJ &) (NP-SBJ (PRO$ my) (NS horsses)) (, ,) (IP-MAT-PRN (NP-SBJ (PRO I)) (VBP thinke)) (, ,) (MD $wil) (BE $be) (CODE {TEXT:wilbe}) (VBN gone) (PP (P to) (NP (N morrowe))) (. ,)) (ID KNYVETT-1630,93.228))
/~* & my horsses, I thinke, $wil $be gone to morrowe, (KNYVETT-1630,93.228) *~/ /* 1 IP-MAT: 9 ,, 10 ,, 11 IP-MAT-PRN, 17 ,, 18 , */ ( (IP-MAT (CONJ &) (NP-SBJ (PRO$ my) (NS horsses)) (PRN-, ,) (IP-MAT-PRN (NP-SBJ (PRO I)) (VBP thinke)) (PRN-, ,) (MD $wil) (BE $be) (CODE {TEXT:wilbe}) (VBN gone) (PP (P to) (NP (N morrowe))) (. ,)) (ID KNYVETT-1630,93.228))
node: $ROOT query: (ADVP* iDoms {1}ADV+*) pre_crop_label{1}: +
( (IP-MAT (CONJ &) (NP-SBJ (Q many)) (VBD lost) (NP-ACC (PRO$ ther) (NS lifes)) (PP (PP (P aboute) (NP (D the) (NS Teames))) (CONJP (CONJ &) (ADVP-LOC (ADV+WADV elsewher)))) (. .)) (ID KNYVETT-1630,87.21))
/~* & many lost ther lifes aboute the Teames & elsewher. (KNYVETT-1630,87.21) *~/ /* 1 IP-MAT: 26 ADVP-LOC, 27 ADV+WADV */ ( (IP-MAT (CONJ &) (NP-SBJ (Q many)) (VBD lost) (NP-ACC (PRO$ ther) (NS lifes)) (PP (PP (P aboute) (NP (D the) (NS Teames))) (CONJP (CONJ &) (ADVP-LOC (WADV elsewher)))) (. .)) (ID KNYVETT-1630,87.21))
This query:
node: IP* query: ({1}NP-ACC iDoms N*) post_crop_label{1}: - append_label{1}: -OBJ
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (NP-ACC (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (NP-OBJ (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
It is possible for the described change to result in an illegal tree, that is, a tree with crossing branches, or a tree containing an internal node with no leaf descendants (a pollarded tree?) If this is the case, a warning is given and the tree is not changed.
node: IP* query: (PP iDoms {1}P) add_leaf_before{1}: (X BULLWINKLE) add_leaf_after{1}: (Q ROCKY)
( (IP-MAT (PP (P Unto) (NP (D that))) (NP-SBJ (PRO they) (QP (Q all))) (ADVP (ADV well)) (VBD accordyd)) (ID CMMALORY,5.110) )
/~* BULLWINKLE Unto ROCKY that they all well accordyd (CMMALORY,5.110) *~/ /* 1 IP-MAT: 2 PP, 3 P */ ( (IP-MAT (PP (X BULLWINKLE) (P Unto) (Q ROCKY) (NP (D that))) (NP-SBJ (PRO they) (QP (Q all))) (ADVP (ADV well)) (VBD accordyd)) (ID CMMALORY,5.110))
node: IP* query: (NP iDoms {1}D) move_up_node{1}:
( (IP-MAT (ADVP-TMP (ADV Thenne)) (PP (P in) (NP (Q all) (N haste))) (VBD came) (NP-SBJ (NPR Uther)) (PP (P with) (NP (D a) (ADJ grete) (N hoost)))) (ID CMMALORY,3.37))
( (IP-MAT (ADVP-TMP (ADV Thenne)) (PP (P in) (NP (Q all) (N haste))) (VBD came) (NP-SBJ (NPR Uther)) (PP (P with) (D a) (NP (ADJ grete) (N hoost)))) (ID CMMALORY,3.37))
node: IP* query: ({1}Q iprecedes {2}ADJ) move_up_nodes{1, 2}:
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (NP-ACC (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (Q no) (ADJ greate) (NP-ACC (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
node: IP* query: ({1}MD HasSister {2}VB) add_internal_node{1, 2}: MDVP
( (IP-MAT-SPE (' ') (NP-VOC (N Sir)) (, ,) (' ') (IP-MAT-PRN (VBD said) (NP-SBJ (NPR Ulfius))) (, ,) (' ') (NP-SBJ (PRO he)) (MD wille) (NEG not) (VB dwelle) (NP-MSR (ADJ long)) (E_S .) (' ')) (ID CMMALORY,3.66))
( (IP-MAT-SPE (' ') (NP-VOC (N Sir)) (, ,) (' ') (IP-MAT-PRN (VBD said) (NP-SBJ (NPR Ulfius))) (, ,) (' ') (NP-SBJ (PRO he)) (MDVP (MD wille) (NEG not) (VB dwelle)) (NP-MSR (ADJ long)) (E_S .) (' ')) (ID CMMALORY,3.66))
To add an internal node spanning just one existing node, list the same index twice. For instance, this query:
query: (IP* iDoms {1}BE*) add_internal_node{1, 1}: VP
( (IP-MAT-SPE (CONJ but) (ADVP (ADV truly)) (NP-VOC (N gossip)) (NP-SBJ (PRO you)) (BEP are) (ADJP (ADJ welcome)) (. ,)) (ID DELONEY,69.9))
/~* but truly gossip you are welcome, (DELONEY,69.9) *~/ /* 1 IP-MAT-SPE: 1 IP-MAT-SPE, 13 BEP */ ( (IP-MAT-SPE (CONJ but) (ADVP (ADV truly)) (NP-VOC (N gossip)) (NP-SBJ (PRO you)) (VP (BEP are)) (ADJP (ADJ welcome)) (. ,)) (ID DELONEY,69.9))
If the indicated leaf is an only child, a warning is given and the tree is not changed.
This query:
node: IP* ignore_nodes: null query: (NP* iDoms {1}\**) delete_leaf{1}:
( (CP-QUE-SPE (INTJP (INTJ Tush)) (NP-VOC (N woman)) (, ,) (WNP-1 (WPRO what)) (IP-SUB-SPE (NP-ACC *T*-1) (VBP talke) (NP-SBJ (PRO you)) (PP (P of) (NP (D that)))) (. ?)) (ID DELONEY,70.40))
/~* Tush woman, what talke you of that? (DELONEY,70.40) *~/ /* 13 IP-SUB-SPE: 14 NP-ACC, 15 *T*-1 */ ( (CP-QUE-SPE (INTJP (INTJ Tush)) (NP-VOC (N woman)) (, ,) (WNP-1 (WPRO what)) (IP-SUB-SPE (VBP talke) (NP-SBJ (PRO you)) (PP (P of) (NP (D that)))) (. ?)) (ID DELONEY,70.40))
This query:
node: FRAG* query: ({1}ADVP* iDoms ADV*) delete_node{1}:
( (FRAG-SPE (WNP (WPRO What)) (ADVP-TMP (ADV neuer)) (NP (D a) (ADJ great) (N belly)) (ADVP (ADV yet)) (. ?)) (ID DELONEY,69.5))
/~* What neuer a great belly yet? (DELONEY,69.5) *~/ /* 1 FRAG-SPE: 5 ADVP-TMP, 6 ADV 1 FRAG-SPE: 15 ADVP, 16 ADV */ ( (FRAG-SPE (WNP (WPRO What)) (ADV neuer) (NP (D a) (ADJ great) (N belly)) (ADV yet) (. ?)) (ID DELONEY,69.5))
This query:
node: IP* query: ({1}CONJP* iDoms CONJ*) delete{1}:
( (IP-MAT (NP-SBJ (PRO I)) (VBP hear) (CP-THT (C 0) (IP-SUB (NP-SBJ (NP (N Lady) (N Banbery)) (CONJP-1 (CONJ and) (NP (D y=e=) (N Wardon) (PP (P of) (NP (NPR All) (NPRS Souls)))))) (BEP is) (ADJP (ADJ dead)))) (. .)) (ID ALHATTON,2,242.21))
( (IP-MAT (NP-SBJ (PRO I)) (VBP hear) (CP-THT (C 0) (IP-SUB (NP-SBJ (NP (N Lady) (N Banbery))) (BEP is) (ADJP (ADJ dead)))) (. .)) (ID ALHATTON,2,242.21))
Old:
(PP (P+D-P dos) (NP (ADJ-P grandes) (N-P homens)
New:
(PP (P $de) (NP (D-P os) (ADJ-P grandes) (N-P homens)
To make the above change, use this query file:
node: IP* //copy_corpus: t query: (PP iDoms {1}P+D-P) AND (P+D-P iDoms {2}dos) AND (P+D-P iPres NP) AND (NP iDomsFirst {3}*) replace_label{1}: P replace_label{2}: $de add_leaf_before{3}: (D-P os)
The query file as shown will produce a standard CS output file. To produce a file containing the input corpus file, with the changes described, un-comment "copy_corpus: t".