CorpusSearch

CorpusSearch 2 is a Java program that supports research in corpus linguistics. It is useful both for the construction of syntactically annotated (parsed) corpora and for searching them. Running CorpusSearch on an appropriately annotated corpus a user can automatically:

find and count lexical and syntactic configurations of any complexity
correct systematic errors
code the linguistic features of corpus sentences for later statistical analysis

Both the input and output files of CorpusSearch are ordinary text files, with syntactic annotations in the Penn-Treebank format.

CorpusSearch 2 runs under any Java-supported operating system, including Linux, Macintosh, Unix and Windows. It requires Java 2, version 1.3 or later. In addition to being downloadable from this site, CorpusSearch is distributed with the Penn-Helsinki Parsed Corpora of Historical English.

The program version on this CD and its documentation are current as of the date below. For updates to the program or to the Users Guide, go to the CorpusSearch Sourceforge site.

CorpusSearch 2: a tool for linguistic research

Updates

Features

Users Guide

Credits

Report Bugs

Developers