1. Conditions of use
All users of the Penn-Helsinki corpora must accept the following conditions of
use. If you are not willing to accept these conditions, you must return the CD
to the party that provided it to you (the vendor, the library, the instructor,
etc.) without using the corpus or making copies of any files on this CD.
Penn-Helsinki Corpus Conditions of Use
- Users must accept that the PPCME2, the PPCEME and the Helsinki Corpus
of Historical English are subject to copyright restrictions. They must
agree to abide by them and understand that violations of copyright
restrictions may result in legal liability.
- Users may make no commercial use of the PPCME2, the PPCEME, or the
Helsinki Corpus without prior permission.
- Users may not redistribute the PPCME2, the PPCEME, or the Helsinki Corpus
to others except in limited passages under the ordinary standards of
scholarly citation.
- Users must agree to acknowledge the PPCME2, the PPCEME, and the Helsinki
Corpus in any written work or oral presentations based on research using
these materials.
- Users must accept that the distributor of the PPCME2 and the PPCEME makes
no warranties, express or implied, concerning the PPCME2 or the PPCEME,
including but not limited to their ownership, merchantability, or fitness
for a particular purpose. The distributor shall not be liable for any
direct, consequential, punitive, or other damages suffered by user or any
other person resulting from the use of the distributed materials.
2. Orientation
The two corpora included in this CD, the PPCME2 and the PPCEME,
are located in the following two directories:
The files of text samples in plain text, part-of-speech tagged text, and
parsed text form are located in the "txt," "pos" and "psd" subdirectories,
respectively. In addition the files of the PPCEME, but not the PPCME2, are
divided into three equal sized subdirectories labeled "helsinki," "penn1," and
"penn2."
All documentation for the two corpora and for the search program
CorpusSearch is accessible from the Penn Corpora Home Page
("index.html") in the "PENN-CORPORA" folder on the CD.
Please copy the "PENN-CORPORA" folder to your hard drive and open the
"index.html" file in your web browser to start exploring the
Penn-Helsinki Parsed Corpora of Historical English. The corpus files
themselves and the output of any searches using CorpusSearch can be read in
a text editor. We recommend an editor like emacs, vi, or pico rather
than a word processing program to avoid the danger of changing the files from
simple text format to the more complex format of a word processor.
CorpusSearch only searches files in text format.
To use CorpusSearch, follow the installation instructions below.
3. How to Install CorpusSearch
- First steps:
- Either use the version supplied on this CD (version 2.002.06) or
download the latest version of "CS.jar" from the CorpusSearch Sourceforge site
(http://corpussearch.sourceforge.net).Please note that the most
recent versions of both the CorpusSearch program and of the User's Guide
will always be available at this site.
- Put the file in a convenient place; for example, the "Applications"
folder under Mac OS X or the "Program Files" folder under Windows.
- If you are using Windows, we recommend downloading a copy of the Java
Virtual Machine from Sun (version 1.4 or later) rather than running
Microsoft's Java. To check which virtual machine you are running under
Windows XP, type:
C:\Documents and Settings\username>java -version
The web address of the Sun java site is:
http://www.java.com/
- Unix/Linux users (including Mac OS X):
We assume that you have put "CS.jar" into the top-level directory (folder) "FOO".
- Open a window that allows you run commands - the Terminal program under
Mac OS X, an xterm window under Unix or Linux.
- Type the following line (minus the prompt "%") at the prompt in the
terminal window:
% java -classpath /FOO/CS.jar csearch/CorpusSearch
The classpath must give the full path to the jar file, using appropriate
syntax.
- Once you are sure that the program runs, you can put an alias to the above command into your shell initialization file (usually, .cshrc or .bashrc).
- Windows users:
We assume that you have put "CS.jar" into the "Program Files" folder.
- Launch the command prompt via Start -> Run and type "cmd" in the box
("command" under Windows 98).
- You should get the following prompt: "C:\>"
- Type the following line (minus the prompt itself) at the prompt:
C:\> java -classpath "C:\Program Files\CS.jar" csearch/CorpusSearch
Note that the direction of the slashes is different in the two arguments after "-classpath". Also note the quotation "marks around the directory path.
- After copying the "PENN-CORPORA folder to your hard drive, read the
CorpusSearch Users Guide in the "CS-docs" subfolder of the PENN-CORPORA folder
to start using the program. Start with the file "CS.html".
Last modified: Wed Jul 12 18:31:10 EDT 2006