Penn Corpora User Introduction

1. Conditions of use

All users of the Penn-Helsinki corpora must accept the following conditions of use. If you are not willing to accept these conditions, you must return the CD to the party that provided it to you (the vendor, the library, the instructor, etc.) without using the corpus or making copies of any files on this CD.

Penn-Helsinki Corpus Conditions of Use

Users must accept that the PPCME2, the PPCEME and the Helsinki Corpus of Historical English are subject to copyright restrictions. They must agree to abide by them and understand that violations of copyright restrictions may result in legal liability.
Users may make no commercial use of the PPCME2, the PPCEME, or the Helsinki Corpus without prior permission.
Users may not redistribute the PPCME2, the PPCEME, or the Helsinki Corpus to others except in limited passages under the ordinary standards of scholarly citation.
Users must agree to acknowledge the PPCME2, the PPCEME, and the Helsinki Corpus in any written work or oral presentations based on research using these materials.
Users must accept that the distributor of the PPCME2 and the PPCEME makes no warranties, express or implied, concerning the PPCME2 or the PPCEME, including but not limited to their ownership, merchantability, or fitness for a particular purpose. The distributor shall not be liable for any direct, consequential, punitive, or other damages suffered by user or any other person resulting from the use of the distributed materials.

2. Orientation

The two corpora included in this CD, the PPCME2 and the PPCEME, are located in the following two directories:

PENN-CORPORA/PPCME2-RELEASE-2/corpus
PENN-CORPORA/PPCEME-RELEASE-1/corpus

The files of text samples in plain text, part-of-speech tagged text, and parsed text form are located in the "txt," "pos" and "psd" subdirectories, respectively. In addition the files of the PPCEME, but not the PPCME2, are divided into three equal sized subdirectories labeled "helsinki," "penn1," and "penn2."

All documentation for the two corpora and for the search program CorpusSearch is accessible from the Penn Corpora Home Page ("index.html") in the "PENN-CORPORA" folder on the CD.

Please copy the "PENN-CORPORA" folder to your hard drive and open the "index.html" file in your web browser to start exploring the Penn-Helsinki Parsed Corpora of Historical English. The corpus files themselves and the output of any searches using CorpusSearch can be read in a text editor. We recommend an editor like emacs, vi, or pico rather than a word processing program to avoid the danger of changing the files from simple text format to the more complex format of a word processor. CorpusSearch only searches files in text format.

To use CorpusSearch, follow the installation instructions below.

3. How to Install CorpusSearch

First steps:
1. Either use the version supplied on this CD (version 2.002.06) or download the latest version of "CS.jar" from the CorpusSearch Sourceforge site (http://corpussearch.sourceforge.net).Please note that the most recent versions of both the CorpusSearch program and of the User's Guide will always be available at this site.
2. Put the file in a convenient place; for example, the "Applications" folder under Mac OS X or the "Program Files" folder under Windows.
3. If you are using Windows, we recommend downloading a copy of the Java Virtual Machine from Sun (version 1.4 or later) rather than running Microsoft's Java. To check which virtual machine you are running under Windows XP, type:
  C:\Documents and Settings\username>java -version
  The web address of the Sun java site is:
  http://www.java.com/
Unix/Linux users (including Mac OS X):
We assume that you have put "CS.jar" into the top-level directory (folder) "FOO".
1. Open a window that allows you run commands - the Terminal program under Mac OS X, an xterm window under Unix or Linux.
2. Type the following line (minus the prompt "%") at the prompt in the terminal window:
  % java -classpath /FOO/CS.jar csearch/CorpusSearch
  The classpath must give the full path to the jar file, using appropriate syntax.
3. Once you are sure that the program runs, you can put an alias to the above command into your shell initialization file (usually, .cshrc or .bashrc).
Windows users:
We assume that you have put "CS.jar" into the "Program Files" folder.
1. Launch the command prompt via Start -> Run and type "cmd" in the box ("command" under Windows 98).
2. You should get the following prompt: "C:\>"
3. Type the following line (minus the prompt itself) at the prompt:
  C:\> java -classpath "C:\Program Files\CS.jar" csearch/CorpusSearch
  Note that the direction of the slashes is different in the two arguments after "-classpath". Also note the quotation "marks around the directory path.
After copying the "PENN-CORPORA folder to your hard drive, read the CorpusSearch Users Guide in the "CS-docs" subfolder of the PENN-CORPORA folder to start using the program. Start with the file "CS.html".

Last modified: Wed Jul 12 18:31:10 EDT 2006