Contents of this chapter:
Coding is used for creating input to multivariate analysis programs like Varbrul; general statistical programming environments like S, Splus, and R; and statistical analysis packages like Datadesk, JMP, SAS, and SPSS.
Coding string values in a coding file may be in part automatically determined with coding queries and in part hand entered in a text editor. The resultant files can then be inputs to further searches.
node: IP*
coding_query:
1: {
s: (IP*SPE* iDoms NP-OB*)
n: ELSE
}
2: {
m: (IP-MAT* iDoms NP-OB*)
s: (IP-SUB* iDoms NP-OB*)
i: (IP-INF* iDoms NP-OB*)
e: ELSE
}
3: {
t: ((IP* iDoms NEG)
AND (NEG iDoms !ne))
p: (IP* iDoms !NEG)
n: ELSE
}
4: {
\1: (NP-OB* domsWords 1)
\2: (NP-OB* domsWords 2)
\3: (NP-OB* domsWords> 2)
\0: ELSE
}
In general, coding files have this form:
<PREAMBLE>
coding_query:
column_number: {
label: condition
label: condition
.
.
.
}
The coding file begins with the preamble commands (see Command File chapter), which must include the obligatory bounding node for the coding queries. The obligatory query specification "coding_query:" then introduces the coding queries for each column of the output coding string.
In the present example, column 1 of the coding string will contain an "s" if IP*SPE* iDoms NP-OB*. Everywhere else, due to the presence of the "ELSE" function (used only in coding queries), the column will contain an "n".
Note that when numerals (0-9) are used as codes, they must be introduced with the backslash character ("\"), as illustrated in column 4 above.
Coding query files are alternatives to ordinary query files in a CorpusSearch run. So, to code a file, invoke CorpusSearch as follows:
java CorpusSearch <coding_file.c> <file_to_code>
Output files resulting from coding will carry the extension .cod. They contain every token of the input file, with coding nodes inserted at every boundary node. A coding node has the form:
(CODING <coding_string>)If a given sentence contains more than one boundary node, the output sentence will contain multiple coding nodes. Here's a sentence from the output file resulting from the above coding file:
/~*
knewe kyndes & complexciones of men & of bestus
(CMHORSES,85.2)
*~/
( (IP-SUB (CODING n:s:p)
(NP-SBJ *T*-1)
(VBD knewe)
(NP-OB1 (NS kyndes)
(CONJ &)
(NS complexciones)
(PP
(PP (P of)
(NP (NS men)))
(CONJP (CONJ &)
(PP (P of)
(NP (NS bestus)))))))
(ID CMHORSES,85.2))
Coding strings may be searched using column. For instance, to find all boundary nodes whose coding string contains "m" or "p" in the 7th column, use this query:
query: (CODING column 7 m|p)
To obtain a file with only the coding strings, use print_only as follows:
print_only: CODINGThe extension of the resultant output file will be .ooo.