ICAME Collection of English Language Corpora
============================================

 Coordinators:Knut Hofland (project leader, conversion and indexing of
              texts),
              Anne Lindebjerg (manuals and CD label/booklet),
              Jørn Tunestvedt (manuals),
              HIT-Centre,
              University of Bergen
 Version 2    Bergen, June 1999
              ISBN 82-7283-091-4
 Publisher/   The HIT Centre
 Distributor: Allégt. 27
              N-5007 Bergen
              Norway
              Telephone: +47 5558 2954
              Telefax: +47 5558 9470
              Electronic mail: icame@hit.uib.no

Each corpus was produced by a different research team, as explained below.


The Brown Corpus
================

The Brown Corpus was compiled in the early 1960s at Brown University, USA,
under the direction of W. Nelson Francis and Henry Kucera.

The WordCruncher version of the Brown Corpus was made by Randall Jones,
Brigham Young University


The LOB Corpus
==============

The Lancaster-Oslo/Bergen (LOB) Corpus was compiled in the 1970s under the
direction of Geoffrey Leech, University of Lancaster, and Stig Johansson,
University of Oslo. The tagging was done by researchers at Lancaster, Oslo,
and Bergen. The principal members of the research teams were:

Lancaster: Geoffrey Leech, Roger Garside, Eric Atwell, Ian Marshall

Oslo/Bergen: Stig Johansson, Knut Hofland, Mette-Cathrine Jahr


The Kolhapur Corpus
===================

The Kolhapur Corpus is an Indian English counterpart of the Brown and LOB
corpora, compiled under the direction of S. V. Shastri, Shivaji University,
Kolhapur. It contains 500 text samples selected from English texts printed
in India in 1978.

The WordCruncher version of the Kolhapur Corpus was made by Gerhard
Leitner, Free University of Berlin and Knut Hofland, HIT-Centre, University
of Bergen.


The London-Lund Corpus
======================

The London-Lund Corpus contains 100 spoken English texts of some 5,000
words collected and transcribed at the Survey of English Usage, University
College London, under the direction of Randolph Quirk, and computerized at
the University of Lund, under the direction of Jan Svartvik (13 of the
texts were computerized at University College London, under the direction
of Sidney Greenbaum). The principal members of the research teams were:

London: Sidney Greenbaum, Andrew Rosta, Akiva Quinn

Lund: Bengt Altenberg, Mats Eeg-Olofsson, Lennart Månsby, Bengt Oreström,
Jan Svartvik, Cecilia Thavenius

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


The Helsinki Corpus of English Texts: Diachronic Part
=====================================================

This corpus was compiled at the University of Helsinki, under the direction
of Matti Rissanen. Other members of the research team were:

Old English: Leena Kahlas-Tarkka, Matti Kilpiö, Ilkka Mönkkönen, Aune
Österman

Middle English: Inkeri Blomstedt, Juha Hannula, Mailis Järviö, Leena
Koskinen, Saara Nevanlinna, Tesma Outakoski, Päivi Pahta, Kirsti Peitsara,
Irma Taavitsainen

Early Modern English: Merja Kytö, Anneli Meurman-Solin, Terttu Nevalainen,
Helena Raumolin- Brunberg, Ritva Tiusanen

The project secretary was Merja Kytö and the research assistants, who keyed
in and proofread texts were:

Kirsi Heikkonen, Jussi Klemola, Asta Kuusinen, Tuula Lehtonen, Tom
Löfström, Arja Nurmi, Minna Palander, Tiina Selki, Päivi Öhman.

The WordCruncher version of the Helsinki Corpus is made by Merja Kytö,
University of Helsinki.


Freiburg-LOB Corpus of British English (FLOB)
=============================================

In 1991, Christian Mair, at Englisches Seminar at
Albert-Ludwigs-Universität Freiburg, took the initiative to compile a set
of corpora that would match the well-known and widely used Brown and LOB
corpora with the only difference that they should represent the language of
the early 1990s. The project started in April 1991. To speed up the process
of compilation, Christian Mair was granted additional funding by the DFG
(German Research Foundation) for the years 1994-1996.

The following have all been involved in the often tedious process of typing
the text-extracts and/or the proofreading: Birgit Felleisen, Heike Fiedler,
Elke Frings, Elke Gebhard, Dorothee Graf, Ulrike Günther, Matthias
Kaufmann, Manfred Krug, Christoph Lindner, Isolde Mattmüller-Ofori, Nadja
Nesselhauf, Christine Oesterlee, and Heike Schnitzler. Heike Fiedler helped
in the final stages of proofreading and the editing of the manual.

Special thanks go to Christoph Lindner who wrote the programs that were
used in assigning the category references and line-numbers to the
ASCII-texts, and to Heide Peper-Ludwig, the main troubleshooter in
computer-related emergencies.

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


Freiburg-Brown Corpus of American English (Frown)
=================================================

In 1991, Christian Mair, at Englisches Seminar at
Albert-Ludwigs-Universität Freiburg, took the initiative to compile a set
of corpora that would match Brown and LOB corpora with the only difference
that they should represent the language of the early 1990s. 1992 saw the
beginning of the new Freiburg Brown Corpus, Frown. To speed up the process
of compilation, Christian Mair was granted additional funding by the DFG
(German Research Foundation) for the years 1994-1996.

The following have all been involved in the often tedious process of typing
the text-extracts and/or the proofreading for the Frown corpus: Jost
Burger, Birgit Felleisen, Elke Gebhard, Dorothee Graf, Ulrike Günther,
Matthias Kaufmann, Manfred Krug, Christoph Lindner, Tobias Maier, Nadja
Nesselhauf, Christine Oesterlee, Stefanie Rapp, Heike Schnitzler, Anne
Schröder. Heike Fiedler and Nicole Knäble helped in the final stages of
proofreading and the editing of the manual.

Special thanks go to Christoph Lindner who wrote the programs that were
used in assigning the category references and line-numbers to the
ASCII-texts, and to Heide Peper-Ludwig, the main troubleshooter in
computer-related emergencies.

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


The Australian Corpus of English (ACE)
======================================

The Australian Corpus of English (ACE) was compiled in the department of
Linguistics at Macquarie University NSW Australia, from 1986 on. It was
supported by a small grant 1988-1989 from the Australian Research Grants
Council, and by a series of grants from Macquarie University. Other support
came from the National Languages and Literacy Institute of Australia and
the University of New South Wales. The project was conceived by Pam Peters,
Peter Collins and David Blair, and was carried through with the help of a
number of research assistants, notably Alison Moore, Elizabeth Green,
Robert Jenkins, Catherine Martin, Diana Grace, Heather Middleton, Wendy
Young and Adam Smith. Computational help and advice was provided by Harry
Purvis and Steve Cassidy, and the project enjoyed continuous infrastructure
support from Macquarie's Speech, Hearing and Language Research Centre.

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


The Wellington Corpus of Written New Zealand English
====================================================

The corpus Wellington Corpus was developed in the Department of Linguistics
at Victoria University of Wellington in the years 1986-1992.

The idea of a New Zealand corpus had been around since the first half of
the 1980s, was canvassed at a Linguistic Society of New Zealand Conference
in Wellington in 1985 by Derek Davy, and was warmly supported by the
Linguistic Society. In 1986 planning for such a project was begun by a
group of people interested in the idea of a corpus from the Department of
Linguistics and the English Language Institute. In 1987 a tentative start
was made on collecting the material for the Press section.

Laurie Bauer took on the task of directing the collection of the written
material.

The project has been generously supported by the Internal Grants Committee
of Victoria University of Wellington, and by the (now defunct) University
Grants Committee.

We have also been helped considerably by the staff of Victoria University's
Computer Services Centre, under the directorship of Frank March, and we
should like to express our appreciation of the effort made by them in aid
of this project.

We were fortunate to be able to employ a number of current and former
Linguistics students as research assistants, and it is their work and care
which have brought the project to a successful conclusion so quickly. I
should like to thank for their hard work on this corpus Anna Adams, Debra
Beckett, Rachel Dickinson, Katrina Foster, Lisa Matthewson, Ruth Pemberton,
Mary Roberts, Shelley Robertson, Jane Sayers, Robert Sigley, Rowena
Simpson.

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


The Spoken English Corpus (SEC)
===============================

The SEC project was supported in 1984-1985 by the University of Lancaster
Humanities Research Fund and by IBM UK Ltd., and subsequently by IBM UK
Ltd. IBM have not only given financial support, but have actively
participated in the project.

A large number of people have contributed to the project:

The project team comprised Dr G Knowles (University of Lancaster), Dr P
Alderson (IBM), Dr B Williams (IBM) and L Taylor (University of Lancaster).
Prof G Leech (University of Lancaster) and Prof G Kaye (IBM) initiated the
project and maintained an acttive collaborative role in it. Additional help
was provided by A Seil and N Campbell (IBM), and S Elliot, C Grover, and Dr
E Briscoe (University of Lancaster).

The majority of texts in the corpus were obtained from the BBC, and thanks
must go to Norma Jones in the BBC Sound Archives for her help in organising
contracts, contacting speakers, and providing information for the three
years of the project.

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


The Wellington Corpus of Spoken New Zealand English (WSC)
=========================================================

Project Director
Janet Holmes

Corpus Research Advisory Group

Laurie Bauer, Allan Bell, David Britain, Graeme Kennedy, Chris Lane, Miriam
Meyerhoff and Maria Stubbe.

Corpus Managers
Miriam Meyerhoff 1989-1991
Maria Stubbe 1991-1992
Raewyn Whyte 1992-1993
Sue Petris 1993-1994
Jane Pilkington 1994
Jennifer O'Brien 1994
Gary Johnson 1994-1997
Bernadette Vine 1997-

Transcribers
Alexander Tripp, Gary Johnson, Martin Paviour-Smith
Angela Lavender, Jane Pilkington, Meg Sloane
Anissa Bain, Jen Hay, Michaela Stirling
Anita Easton, Jennifer O'Brien, Nina Flinkenberg
Ben Taylor, Jenny Allan, Penny Wilson
Bernadette Vine, Kate Kilkenny, Rachel Lum
Camille Plimmer, Kate Wadsworth, Rowena Samaraweera
Claire Solon, Kerry McCarty, Sarah Dreyer
Elizabeth Smith, Lynnette Sollitt-Morris, Shelley Robertson
Esther Griffiths, Margaret Cain, Sue Petris

Research Assistants
Alexandra Manolis, Keri Shepherd, Meredith Marra
Anna Adams, Louise Burns, Ruth Katene
Anthony Singleton, Maria Aptekar, Robert Sigley
Clare Taylor, Maria Tuinman, Shannon Marra
Inga Fillary, Maryann Nesbit, Sue Jones

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


The Bergen Corpus of London Teenage Language (COLT)
===================================================

The project was initiated by Anna-Brita Stenström in collaboration with
Leiv Egil Breivik and was carried through with the help of postgraduate
students employed as research assistants, notably Gisle Andersen, Vibecke
Haslerud, Kristine Hasund, Migle Miliauskaite, Kristine Monstad, Ingrida
Strazdaite, Nina Sørli, Ingrid Thompson and Hanne Aas. In addition, Lars
Johannessen was engaged for the preparation of the material for
text-to-sound conversion, which was completed by Tony Robinson at
SoftSound, St Albans.

We are extremely grateful to the Department of Education in London for
suggesting suitable London schools for collecting the material; to the
Longman Group, London, not only for letting us use the method of corpus
collection that was used for the collection of the British National Corpus
but also for carrying out the orthographic transcription; and finally to
the researchers at Lancaster University, in particular Elizabeth Eyes, for
doing the word class tagging.

The project could hardly have been carried through without the assistance
of Knut Hofland at The Norwegian Computing Centre for the Humanities and,
at a later stage, Manfred Thaller at the Centre for Huminaties Information
Technologies Research, both at the University of Bergen.


The Helsinki Corpus of Older Scots
==================================

The compiler of the corpus is Anneli Meurman-Solin, University of Helsinki.
Research assistants were Kirsi Heikkonen and Arja Nurmi. The compiler would
like to thank Matti Rissanen, Merja Kytö and A.J. Aiken for support in the
compilation.


The Corpus of Early English Correspondence
==========================================

The Corpus of Early English Correspondence (CEEC) and the Corpus of Early
English Correspondence Sampler (CEECS) have been compiled by the
Sociolinguistics and Language History project team at the Department of
English, University of Helsinki. The project has been funded by the Academy
of Finland (1993-95) and the University of Helsinki (1996-98). The team is
lead by Professor Terttu Nevalainen and includes senior researcher Helena
Raumolin-Brunberg, and researchers Jukka Keränen, Minna Nevala, Arja Nurmi
and Minna Palander-Collin. We have been helped in the compilation of the
corpus by Kirsi Heikkonen, and in the proofreading by Alistair
Melville-Smith, Taru Nurmi, Arja-Liisa Rossi, Reza Sanatnama, Heli Tissari
and Anne Virolainen.


The Newdigate Newsletters
=========================

Compiled by: Philip Hines, Jr., Norfolk, VA USA

Advice and help have come to me from many friends, colleagues, and former
students, all of which I gratefully acknowledge. I wish especially to thank
Laetitia Yeandle, Manuscript Curator at the Folger Library; Garland F.
White III, former Director of the Computer-Based Laboratory for Instruction
and Analysis at Old Dominion University; and Henry L. Snyder of the
University of California, Riverside, Director of "The Eighteenth Century
Short Title Catalogue--North America," for much very fundamental aid. I
thank the Research Foundation and the Research and Publication Committees
of the College of Arts and Letters and of the Department of English (all of
Old Dominion University) for grants-in-aid in support of this project. And
for their faithful and effective help in transcribing the letters I thank
Eric Bing, Wayne E. Bowman, Kevin Farley, Frances Johnson, Daniel Martin,
Gwen McAlpine, Alison

Rand, Nancy Rector, and Mark Thorsen.

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


The Lampeter Corpus of Early Modern English Tracts
==================================================

The Lampeter project was initiated in 1991 by Prof. Dr. Josef Schmied and
Eva Hertel at Bayreuth University and moved with them to Chemnitz in 1993.
It has been funded by the Deutsche Forschungsgemeinschaft (DFG), the German
Research Association, since 1994. Travel grants made available by the
Deutscher Akademischer Austauschdienst (DAAD), the German Academic Exchange
Service, have made possible research collaboration with the English
Department at Helsinki University and the Department of Linguistics &
Modern English Language at Lancaster University (Gerald Knowles, Tony
McEnery and Andrew Wilson) on questions of corpus compilation and
annotation. The current compilers are Claudia Claridge and Rainer Siemund,
both of them linguists with an accompanying major in history. Eva Hertel
was responsible for the early stages of compilation. Student assistants:
Jeannine Stöhrer, Angelika Giesecke, Astrid Lohse, Anja Ficker, Daniela
Zierold, Mario Nyeki and Manuela Sachs. The corpus passages in Greek script
were transliterated by Daniela Schindling and the items in Semitic by Gerry
Knowles. Hildegard Schäffler provided the ESTC-information for the headers.

Markup according to the guidelines of the Text Encoding Initiative (TEI)
and use of the Standard Generalized Markup Language (SGML), in
collaboration with Lou Burnard and the Oxford Text Archive.

The WordCruncher version has been made by Knut Hofland, University of
Bergen.


Lancaster Parsed Corpus
=======================

Roger Garside, Geoffrey Leech and Tamás Váradi, Lancaster University.

We are grateful for help received from the following sources:

(a) The development of the Parsed Corpus was originally undertaken in
1983-6, with the support of Research Grant GR/C/47700 funded by the Science
and Engineering Research Council (SERC).

(b) The automatic probabilistic parser which produced the parses (prior to
post-editing) derived its frequency data from another parsed database of
sentences from the LOB Corpus, known as the Lancaster-Leeds Treebank,
manually parsed by Geoffrey Sampson (see R. Garside, G. Leech and G.
Sampson, The Computational Analysis of English: a Corpus-based Approach,
London: Longman, 1987, Chapter 7). The Lancaster Parsed Corpus implements a
simplified version of the parsing scheme more fully instantiated in
Sampson's treebank.

(c) The post-editing of the corpus was undertaken by a number of research
students at Lancaster. We particularly acknowledge the major post-editing
work undertaken by Heather Kempson and by Srikant Sarangi. Finally, the
whole corpus was thoroughly checked and corrected by Tamás Váradi.

(d) Steve Fligelstone and Andrew Wilson gave invaluable help in the final
stages of checking and producing the corpus.

(e) A number of errors were reported by Qiao Hong Liang of the University
of Queensland. Corrections were made to the corpus in April 1995.


The International Corpus Of English - East African component
============================================================

The East African component of The International Corpus of English (ICE-EA)
is a computerized collection of spoken and written texts from Kenya and
Tanzania. It is the result of a project started in 1989 at the University
of Bayreuth and continued from 1995 at the Chemnitz University of
Technology within the framework of the Special Research Programme on
Africa, which was financially supported by the German Research Foundation
(DFG).

The following team of researchers are responsible for the compilation of
ICE-EA:

Diana Hudson-Ettle (Co-ordinator), Barbara Krohne (Assistant Co-ordinator),
Josef Schmied (Project Director). Our work would not have been possible
without the help of many friends and colleagues during fieldwork but we
would like to mention and thank especially Casmir Rubagumya (University of
Dar es Salaam), who gained access to and provided the main part of the
Tanzanian spoken data, Eunice Nyamasyo (Kenyatta University) and Kembo Sure
(Moi University), who were of assistance in acquiring some of the Kenyan
texts.

We should also like to express our gratitude to Eva Hertel, colleague and
PhD scholar, for her support, Paul Skandera, PhD scholar, for the material
he provided from his stay in Kenya, and Jemimah Mwakisha, the Kenyan
journalist, for her part in helping us to obtain written texts and
sociolinguistic information about the authors.

A number of undergraduate student assistants helped us by typing, scanning
and proofreading the corpus texts. We thank them and also those who
undertook the first stages in the particularly time-consuming and often
quite demanding task of transcribing the spoken texts. Special mention must
be made here of Gabriele Engelhardt, Astrid Lohse, Dirk Schmerschneider and
Katrin Voigt.


Innsbruck Computer-Archive of Machine-Readable English Texts (ICAMET)
=====================================================================

Compiler: Manfred Markus, Institut fuer Anglistik, Universitaet Innsbruck.


Polytechnic of Wales Corpus
===========================

Compiled by Robin Fawcett and Michael R. Perkins, Polytechnic of Wales,
Pontypridd. Handbook by Clive Souter, University of Leeds.


WordCruncher ViewLtd 4.5 DOS
============================

Copyright 1985-92 Brigham Young University. Licenced from Electronic Text
Corporation/CD Danmark A/S, Copenhagen.


LEXA and Linguafont
===================

The programs are written by Raymond Hickey, Essen University.


Qwick
=====

Qwick is a JAVA application which uses the CUE system, which was originally
developed at Birmingham University by Oliver Mason and John Sinclair.


WordSmith
=========

The program is written by Mike Scott at Liverpool University.


Textual Analysis Computing Tools (TACT)
=======================================

TACT is owned and managed by the University of Toronto and the following
principals of the TACT Group, all members of the University:

John Bradley: Computing and Communications
Lidio Presutti: Humanities Programmer, Computing and Communications
(1983-1991)
Michael Stairs: Centre for Computing in the Humanities, Faculty of Arts and
Science
Ian Lancashire: Department of English

Full credits may be found in each program by pressing F6, or in the case of
UseBase, Ctrl-F1. The credits for the Install program are as follows:

PRINCIPAL FOR INSTALL 2.1

Michael Stairs - Designer-Programmer, Vers 2.1

Copyright for this program is held by the University of Toronto and the
above named principal of the TACT Group, all members of the University. The
system is managed by the TACT Group, which includes all principals and the
following University members:

   * John Bradley, UTCC, project manager, principal designer and founding
     system architect (1984-mid-92)
   * Edward Heinemann, Department of French, usability testing (1991-)
   * John C. Hurd, Faculty of Divinity, Trinity College, Chair, Software
     Development Committee (1986-89)
   * Ian Lancashire, Department of English, project manager (mid-1992--),
     design (1990-), project administration (IBM cooperative), consultant
     and usability testing (1984-)
   * Willard McCarty, CCH, usability testing and design (1991-)
   * Lidio Presutti, UTCC, designer-programmer (1987-91)
   * Michael Stairs, CCH, designer-programmer (1991-92), principal system
     designer-programmer (1993-)
   * T. R. Wooldridge, Department of French, Chair, Software Development
     Committee (1990-91), usability testing (1991-)