Table 1: Wordcount summary by time period and subcorpus Helsinki Penn 1 Penn 2 Total E1 (1500-1569) 196,754 194,018 185,423 576,195 E2 (1570-1639) 196,742 223,064 232,993 652,799 E3 (1640-1710) 179,477 197,908 187,631 565,016 Total 572,973 614,990 606,047 1,794,010
Table 2: Wordcount summary by text genre Text genre Number of words Percentage Bible 134,275 7.5% Biography, autobiography 41,379 2.3% Biography, other 52,755 2.9% Diary, private 123,106 6.9% Drama, comedy 120,428 6.7% Educational treatise 113,032 6.3% Fiction 116,494 6.5% Handbook, other 112,419 6.3% History 108,706 6.1% Law 115,863 6.5% Letters, non-private 59,868 3.3% Letters, private 116,915 6.5% Philosophy 85,107 4.7% Proceedings, trials 105,090 8.4% Science, medicine 41,786 2.3% Science, other 79,050 4.4% Sermon 97,400 5.4% Travelogue 125,337 7.0% Total 1,794,010 100%
Finally,
Table 3: Wordcount information by individual text, broken down by time period and subcorpus Helsinki Penn1 Penn2 abott-e1 0 363 0 ambass-e1 0 0 1410 aplumpt-e1 1863 319 0 apoole-e1 0 0 211 asch-e1 5050 5339 5209 bedyll-e1 448 0 0 boethco-e1 10144 11772 10514 chaplain-e1 0 0 5171 cromwell-e1 434 512 870 dacre-e1 0 0 991 delapole-e1 0 260 0 dplumpt-e1 397 0 0 ebeaum-e1 401 0 0 ecumberl-e1 190 0 0 edward-e1 6207 6566 6655 eliz-1560-e1 0 0 313 elyot-e1 5603 5752 6059 epoole-e1 0 0 919 fabyan-e1 5732 5907 5805 fisher-e1 4857 5382 5209 fitzh-e1 5264 5439 5613 friar-e1 0 907 0 gascoigne-1500-e1 0 536 0 gascoigne-1510-e1 0 305 0 gcromw-e1 500 526 242 gpoole-1500-e1 0 0 2033 gpoole-1510-e1 0 0 1119 grey-e1 0 88 0 harman-e1 5409 5910 5621 henry-1510-e1 0 0 206 henry-1520-e1 917 1117 0 henry-1530-e1 0 0 758 interview-e1 0 0 380 iplumpt-e1 356 0 0 kscrope-1530-e1 257 0 0 latimer-e1 5099 5430 5121 leland-e1 6772 7132 6742 lords-e1 433 0 0 machyn-e1 6908 7424 7394 manners-e1 0 0 284 marches-e1 0 493 0 merrytal-e1 6656 6950 6846 mhoward-e1 0 0 322 morelet1-e1 507 0 0 morelet2-e1 4207 4295 0 moreric-e1 5709 6398 6458 morewol-e1 1440 1589 1396 mowntayne-e1 6055 6243 1660 mroper-e1 858 4291 3368 mtudor-1510-e1 0 910 470 mtudor-1520-e1 0 0 197 nevill-e1 0 158 0 record-e1 6768 7206 6583 roper-e1 5674 5742 5766 rplumpt-e1 352 459 0 rplumpt2-e1 0 561 335 russell-e1 0 0 371 savill-e1 0 1734 0 stat-1500-e1 7848 0 0 stat-1510-e1 0 0 2318 stat-1530-e1 0 9155 0 stat-1540-e1 3979 3733 0 stat-1550-e1 0 0 5788 stat-1560-e1 0 0 4199 stevenso-e1 6501 6724 3309 surety-e1 0 0 146 thoward-e1 391 297 0 throckm-e1 17364 0 0 torkingt-e1 7446 7675 2764 tunstall-e1 1123 906 0 turner-e1 5142 5307 5215 turnerherb-e1 0 0 819 tyndnew-e1 11150 12365 10848 tyndold-e1 10079 10759 10292 udall-e1 4807 5072 8034 underhill-e1 0 0 5609 vicary-e1 6280 6779 6322 wcecil-1560-e1 1208 0 0 wolsey-e1 1064 1231 1139 wplumpt-1500-e1 336 0 0 wplumpt-1510-e1 227 0 0 wplumpt-1530-e1 342 0 0 ----------------------------------------- armin-e2 5267 5416 5219 authnew-e2 11625 12883 11290 authold-e2 10862 11348 10774 bacon-e2 5948 6139 6169 blundev-e2 6573 7190 6983 boethel-e2 7206 7942 7445 brinsley-e2 5669 5930 5900 clowes-e2 7330 7654 5330 clowesobs-e2 0 0 2091 conway-e2 720 0 0 coverte-e2 6206 6467 6304 deloney-e2 7729 8599 8211 dering-e2 0 0 1763 edmondes-e2 1264 1808 1087 eliz-1570-e2 0 0 1061 eliz-1580-e2 0 1660 0 eliz-1590-e2 1576 0 300 essex-e2 5912 3215 0 essexstate-e2 0 3604 7095 everard-e2 557 0 0 forman-diary-e2 0 4125 3173 forman-e2 4142 0 0 gawdy-e2 944 1027 1295 gifford-e2 6332 6632 6232 harley-e2 1737 461 0 harleyedw-e2 0 1569 1804 hatcher-e2 0 281 0 hayward-e2 5305 5706 5671 hoby-e2 6156 7136 6496 hooker-a-e2 2514 2751 2995 hooker-b-e2 2798 2833 2245 jbarring-e2 875 986 905 jotaylor-e2 9475 9679 9419 joxinden-e2 0 636 1540 jubarring-e2 0 777 582 judall-e2 0 0 9195 knyvett-1620-e2 2984 4419 3641 koxinden-e2 170 205 253 kpaston-e2 843 1164 733 kscrope-1580-e2 0 253 0 madox-e2 6438 6790 6744 markham-e2 6243 6443 6416 masham-e2 859 1022 901 middlet-e2 6401 6573 6432 moxinden-e2 0 332 497 nferrar-e2 127 429 498 perrott-e2 5283 5689 5158 pettit-e2 434 467 574 peyton-e2 343 0 0 proud-1620-e2 176 0 0 proud-1630-e2 0 472 0 raleigh-e2 9300 9503 996 rcecil-e2 0 1398 1157 rferrar-e2 199 0 0 rich-e2 0 1697 0 roxinden-1600-e2 0 319 0 roxinden-1620-e2 458 75 0 roxinden2-e2 0 212 439 shakesp-e2 6923 7498 7290 smith-e2 5399 6732 4683 stat-1570-e2 0 2406 2376 stat-1580-e2 1526 0 6618 stat-1590-e2 5193 0 0 stat-1600-e2 5175 0 2246 stat-1620-e2 0 0 4593 stat-1640-e2 0 0 7130 stow-e2 5559 6245 5327 talbot-e2 0 0 266 tbarring-e2 443 582 607 thoward2-e2 0 17685 18179 trincoll-e2 201 0 0 wcecil-1580-e2 905 0 665 wpaston2-e2 438 0 0 ----------------------------------------- alhatton-e3 758 0 0 alhatton2-e3 0 891 0 anhatton-e3 526 712 609 aungier-e3 1065 1010 1302 behn-e3 5742 6137 5680 boethpr-e3 9075 11272 9737 boyle-e3 5423 4340 0 boylecol-e3 0 1349 5790 burnetcha-e3 5813 5928 5849 burnetroc-e3 6432 6557 6454 capel-e3 480 570 558 charles-1650-e3 0 959 0 charles-1670-e3 824 0 1075 chatton-e3 587 747 576 commiss-e3 715 0 0 conway2-e3 0 0 1794 counc-e3 190 0 0 dell-e3 0 591 0 drummond-e3 0 2065 0 ehatton-e3 360 0 0 ehatton2-e3 0 0 856 eoxinden-1650-e3 0 344 0 eoxinden-1660-e3 1868 519 0 eoxinden-1680-e3 0 558 0 evelyn-e3 5945 6268 6325 farquhar-e3 6369 6891 6098 fhatton-e3 469 0 0 fiennes-e3 5171 5444 5640 fox-e3 5540 6118 6012 fryer-e3 5733 6052 6045 hooke-e3 6381 7275 7189 hoole-e3 6345 6587 6700 hoxinden-1640-e3 0 3357 0 hoxinden-1650-e3 0 0 3817 hoxinden-1660-e3 3395 0 0 jackson-e3 0 500 0 jetaylor-e3 6417 5362 0 jetaylormeas-e3 0 1344 6735 jopinney-e3 407 525 2012 jpinney-e3 879 189 0 langf-e3 7123 7750 7811 lisle-e3 6428 7249 6685 locke-e3 5237 5864 5453 memo-e3 0 863 452 mhatton-e3 0 550 0 milton-e3 6661 7320 7313 montague-e3 0 0 1321 nhadd-1700-e3 255 520 456 oates-e3 8689 9758 9233 osborne-e3 988 474 0 penny-e3 6893 8315 5894 pepys-e3 5189 5821 5346 phenry-e3 576 808 585 proposals-e3 0 0 460 rhaddjr-e3 208 0 0 rhaddsr-1650-e3 0 1936 0 rhaddsr-1670-e3 710 0 2501 rhaddsr-1680-e3 0 0 205 rhaddsr-1700-e3 989 0 186 somers-e3 726 0 0 southard-e3 0 405 0 spencer-1680-e3 793 0 0 spencer-1700-e3 0 979 0 spencer-1700-e3 0 0 822 stat-1660-e3 0 0 10615 stat-1670-e3 0 0 3455 stat-1690-e3 13241 14269 0 strype-e3 1016 1354 0 tillots-a-e3 3201 2814 0 tillots-b-e3 3708 5066 0 tillots-c-e3 0 0 6784 vanbr-e3 8061 8659 8786 walton-e3 5876 6673 6089 zouch-e3 0 0 326
In addition, the filenames in the PPCEME contain an indication of which subcorpus they belong to.
A few examples:
In tripling the size of the samples from the Helsinki Corpus, we
have sometimes had to include texts by new authors (either because the
Helsinki Corpus sample for an author was itself already exhaustive, or
because we ran out of text in the course of tripling the sample size).
In what follows, we describe the conventions that we have followed in
assigning filenames to these new authors. Our general rule has been
to leave Helsinki Corpus filenames unchanged, but we have sometimes
slightly modified the original Helsinki
filenames for clarity and consistency. These modifications as well as
which PPCEME files supplement which Helsinki Corpus files are set out in
In the correspondence of important families (such as that of the
Barringtons, the Hattons, or the Plumptons), the Helsinki Corpus tends
to identify women by their birthname, and we retain those filenames.
So Anne Finch, countess of Nottingham, nee Hatton, is identified as
anhatton (not finch).
Where the Helsinki Corpus distinguishes individuals with the same
name by means of "jr" and "sr", we retain that usage. However, in
texts added at Penn, we do not specially distinguish the father-son
relationship. We follow this convention in order to avoid having to
change Helsinki filenames. So the Valentine Pettit of the Helsinki
Corpus is identified as pettit, and his son of the same name, who is
not represented in the Helsinki Corpus, is identified as pettit2.
For clarity and consistency, we have adopted the convention that
Arabic numbers immediately following an author's name always indicate
distinct authors. This forces us to modify Helsinki filenames
of the type mentioned in the preceding paragraph. hooker1 and
hooker2 become hooker-a and hooker-b. In the case of the wplumpt
and stat files, we distinguish the individual files by decade, as
described directly below.
Similarly, the wplumpt files mentioned earlier are identified as
wplumpt-1500, wplumpt-1510, and wplumpt-1530, and the statute files
appear as stat-1500, stat-1510, and so on.
In one or two cases, material by a single author spans more than
a decade, but we identify the samples in some other way than by
decade in order to avoid changing a Helsinki filename. For instance,
Brilliana Harley's letters to her husband (all but one of which are
part of the Helsinki Corpus) are identified as harley, and her
later letters to her son, which are not included in the Helsinki
Corpus, are identified as harleyedw. Thomas More's one personal
letter from the 1620s (included in the Helsinki Corpus) is
identified as morelet1, and his personal letters from the 1630s
(some from the Helsinki Corpus, and some added by us) are identified
as morelet2.
Name vs. title
Following the conventions of the Helsinki Corpus, authors are identified
by name rather than by title. Sovereigns of England are identified by
their given name. For instance, Charles II is identified as charles.
Other members of the nobility, including members of the royal family,
are identified by their surname. For instance, Thomas Howard, earl of
Surrey, 2nd duke of Norfolk, is identified as thoward (not norfolk), and
Mary Tudor (Henry VIII's sister, not to be confused with his daughter,
Mary I, who is not represented in the corpus) as mtudor.
In one or two cases, the Helsinki Corpus uses a title
rather than a surname as the basis for a filename. For instance,
Eleanor Clifford, countess of Cumberland, is identified as ecumberl
(not clifford). In such cases, we retain the Helsinki filename in
order to minimize confusion.
Women's names
As a general rule, women are identified by their surname at the time of
writing. Generally (though not always), this is a married name. In
order to minimize confusion, we do not change filenames to reflect a
later marriage. Two examples:
In one or two cases, a woman appears in the Helsinki
Corpus under her married name despite belonging to one of the
important correspondence families. For instance, Joan Everard and
Elizabeth Masham, both n&ecutee;e Barrington, are identified as
everard (not jobarring) and masham (not ebarring). In such cases,
we use the Helsinki filenames in order to minimize confusion.
Modifications of Helsinki Corpus filenames
Under certain circumstances, we have modified the filenames in the
Helsinki Corpus for clarity and consistency. The conventions governing
these modifications are given here, and the correspondence between the
old and new filenames are set out in
In one or two cases, it is not clear whether a
Helsinki Corpus file contains material spanning more than a decade
or not. In such cases, we have divided the file into the two
identifiable time chunks, but not identified them by decade.
For instance, tillots is divided into tillots-a (from before 1671)
and tillots-b (from 1679).
Table 4: Summary of filename modifications and PPCEME-Helsinki correspondences | ||
---|---|---|
Helsinki filename | PPCEME filename (if different from Helsinki) | Supplemented by |
alhatton | --- | alhatton2, ehatton2 |
bedyll | --- | friar, russell |
boyle | --- | boylecol |
clowes | --- | clowesobs |
conway | --- | rich |
counc | --- | dell |
ebeaum | --- | mtudor-1510, mtudor-1520 |
ecumberl | --- | manners, delapole |
ehatton | --- | mhatton, montague |
eliz1, eliz2 | included in eliz-1590 | eliz-1560, eliz-1570, eliz-1580 |
eoxinden | included in eoxinden-1660 | dering, eoxinden-1650, eoxinden-1680, jackson, zouch |
essex | --- | essexstate |
everard | --- | jubarring |
fhatton | --- | mhatton |
harley | --- | harleyedw |
henry1, henry2 | included in henry-1520 | henry-1530 |
hooker1 | included in hooker-a | --- |
hooker2 | included in hooker-b | --- |
hoxinden | hoxinden-1660 | hoxinden-1640, hoxinden-1650 |
jetaylor | --- | jetaylormeas |
jpinney | --- | southard, part of jopinney |
knyvett | included in knyvett-1620 | knyvett-1630 |
kscrope | kscrope-1530 | grey, kscrope-1580, mhoward |
lords | --- | interview, marches, surety |
morelet1, morelet2 | --- | part of mroper (see Remarks therein) |
mowntayne | --- | underhill |
nhadd | included in nhadd-1700 | nhadd-1710 |
osborne | --- | conway2 |
pettit | --- | pettit2 |
peyton | --- | moxinden |
Plumpton correspondence | --- | abott, apoole, epoole, gascoigne, gpoole, nevill, rplumpt2, savill |
proud | proud-1620 | proud-1630 |
raleigh | --- | judall |
rferrar | --- | part of nferrar |
rhaddsr | included in rhaddsr-1670 and rhaddsr-1700 | rhaddsr-1650, rhaddsr-1710 |
roxinden | included in roxinden-1620 | roxinden-1600, roxinden2 |
somers | --- | drummond |
stat3 | stat-1500, included in stat-1540; see info for stat-period1.e1 | stat-1510, stat-1530, stat-1550, stat-1560 |
stat4 | stat-1590, included in stat-1600; see info for stat-period2.e2 | stat-1570, stat-1580, stat-1620, stat-1640 |
stat7 | included in stat-1690; see info for stat-period3.e3 | stat-1660 |
stevenso | --- | part of udall |
strype | --- | joxinden |
thoward | --- | dacre |
throckm | --- | thoward2 |
tillots | divided into tillots-a, tillots-b | tillots-c |
torkingt | --- | chaplain |
trincoll | --- | hatcher, talbot |
tunstall | --- | ambass |
turner | --- | turnerherb |
wcecil | included in wcecil-1580 | wcecil-1560 |
wpaston2 | --- | joxinden |
wplumpt1 | wplumpt-1500 | --- |
wplumpt2 | wplumpt-1510 | --- |
wplumpt3 | wplumpt-1530 | --- |