The PPCME2 text samples are based largely on the Middle English section of the Diachronic Part of the Helsinki Corpus of English Texts (available from ICAME), with certain additions and deletions. However, the size of the samples is considerably larger. For the earliest Helsinki time period, all texts are exhaustively sampled. For later Helsinki time periods, two texts per period were expanded to 50,000 words. The remaining texts are represented by the Helsinki Corpus sample.
The main Helsinki time periods are M1-M4, each covering approximately one hundred years. In addition, texts originally written in a given period but for which the earliest manuscript is from a later period are given two digit period designations. The following is a list of all Helsinki periods as they appear in the corpus file names:
Period designation Composition date Manuscript date MX1 unknown 1150-1250 M1 1150-1250 1150-1250 M2 1250-1350 1250-1350 M23 1250-1350 1350-1420 M24 1250-1350 1420-1500 M3 1350-1420 1350-1420 M34 1350-1420 1420-1500 MX4 unknown 1420-1500 M4 1420-1500 1420-1500
The current edition of the PPCME2 includes a total of roughly 1.2
million words of running text. It comprises 55 text samples, each of
which is given in three forms: a text file, a part-of-speech tagged file
and a parsed file. In addition, there is a file with philological and
bibliographical information about each text.
The wordcounts for the individual files are given in the table
below. The information in the table is also contained in the file
WORDCOUNT-PPCME2 in the info directory for the PPCME2. The
file contains only ASCII characters and is suitable for importing into a
spreadsheet program; the record separator is the space character.
Wordcount information
Below is a table of the number of words by Helsinki time period:
Period
Word count
MX1
62,596
M1
195,494
M2
93,999
M23
17,013
M24
35,591
M3
385,994
M34
99,994
MX4
5,168
M4
260,116
Total
1,155,965
cmaelr3.m23 17013
cmaelr4.m4 11181
cmancriw-1.m1 48566
cmancriw-2.m1 15224
cmastro.m3 6897
cmayenbi.m2 45944
cmbenrul.m3 18221
cmboeth.m3 10415
cmbrut3.m3 49099
cmcapchr.m4 52716
cmcapser.m4 1469
cmcloud.m3 15599
cmctmeli.m3 17005
cmctpars.m3 30416
cmearlps.m2 44521
cmedmund.m4 3847
cmedthor.m34 13949
cmedvern.m3 12843
cmequato.m3 6261
cmfitzja.m4 5652
cmgaytry.m34 5238
cmgregor.m4 37326
cmhali.m1 8495
cmhilton.m34 4963
cmhorses.m3 5902
cminnoce.m4 4329
cmjulia.m1 6810
cmjulnor.m34 5004
cmkathe.m1 8699
cmkempe.m4 60212
cmkentho.m1 4048
cmkentse.m2 3534
cmlamb1.m1 6459
cmlambx1.mx1 20752
cmmalory.m4 57775
cmmandev.m3 49690
cmmarga.m1 8069
cmmirk.m34 57548
cmntest.m3 11081
cmorm.po.m1 50579
cmotest.m3 10015
cmpeterb.m1 6757
cmpolych.m3 46444
cmpurvey.m3 39704
cmreynar.m4 8850
cmreynes.m4 9100
cmrollep.m24 17960
cmrolltr.m24 17631
cmroyal.m34 6231
cmsawles.m1 4111
cmsiege.m4 7659
cmthorn.mx4 5168
cmtrinit.mx1 41844
cmvices1.m1 27677
cmvices4.m34 7061
cmwycser.m3 56402