Contents of this chapter:
Coding is used for creating input to multivariate analysis programs like Varbrul; general statistical programming environments like S, Splus, and R; and statistical analysis packages like Datadesk, JMP, SAS, and SPSS.
Coding string values in a coding file may be in part automatically determined with coding queries and in part hand entered in a text editor. The resultant files can then be inputs to further searches.
node: IP* coding_query: 1: { s: (IP*SPE* iDoms NP-OB*) n: ELSE } 2: { m: (IP-MAT* iDoms NP-OB*) s: (IP-SUB* iDoms NP-OB*) i: (IP-INF* iDoms NP-OB*) e: ELSE } 3: { t: ((IP* iDoms NEG) AND (NEG iDoms !ne)) p: (IP* iDoms !NEG) n: ELSE } 4: { \1: (NP-OB* domsWords 1) \2: (NP-OB* domsWords 2) \3: (NP-OB* domsWords> 2) \0: ELSE }
In general, coding files have this form:
<PREAMBLE> coding_query: column_number: { label: condition label: condition . . . }
The coding file begins with the preamble commands (see Command File chapter), which must include the obligatory bounding node for the coding queries. The obligatory query specification "coding_query:" then introduces the coding queries for each column of the output coding string.
In the present example, column 1 of the coding string will contain an "s" if IP*SPE* iDoms NP-OB*. Everywhere else, due to the presence of the "ELSE" function (used only in coding queries), the column will contain an "n".
Note that when numerals (0-9) are used as codes, they must be introduced with the backslash character ("\"), as illustrated in column 4 above.
Coding query files are alternatives to ordinary query files in a CorpusSearch run. So, to code a file, invoke CorpusSearch as follows:
java CorpusSearch <coding_file.c> <file_to_code>
Output files resulting from coding will carry the extension .cod. They contain every token of the input file, with coding nodes inserted at every boundary node. A coding node has the form:
(CODING <coding_string>)If a given sentence contains more than one boundary node, the output sentence will contain multiple coding nodes. Here's a sentence from the output file resulting from the above coding file:
/~* knewe kyndes & complexciones of men & of bestus (CMHORSES,85.2) *~/ ( (IP-SUB (CODING n:s:p) (NP-SBJ *T*-1) (VBD knewe) (NP-OB1 (NS kyndes) (CONJ &) (NS complexciones) (PP (PP (P of) (NP (NS men))) (CONJP (CONJ &) (PP (P of) (NP (NS bestus))))))) (ID CMHORSES,85.2))
Coding strings may be searched using column. For instance, to find all boundary nodes whose coding string contains "m" or "p" in the 7th column, use this query:
query: (CODING column 7 m|p)
To obtain a file with only the coding strings, use print_only as follows:
print_only: CODINGThe extension of the resultant output file will be .ooo.