This directory contains perl scripts for use with ECI/MCI.

-----------
ecipath.pl
-----------

Used by bin/unix/eci (q.v.) to compute the appropriate setting for
SGML_PATH for a given ECI/MCI component.

-----------
mul11txt.pl
-----------

Program for use with the MUL11 corpus.  Due to lack of time this
corpus is provided in its original form. It is coded in EBCDIC and
contains DisplayWrite(?) formatting codes.  Mul11txt.pl takes a set of
files from the mul11 (credit_suisse) eci corpus, and
processes the texts by converting from EBCDIC to the ISO-LATIN-1
character set and removing formatting and control codes. No attempt is
made to decode these formatting codes into SGML. It concatenates all
the files, separates them with <div0 type=file n={original file name}>
and prints to the standard output

This is not perfect but it provides most of the text with the minimum
of garbage.

Usage: perl mul11txt.pl <files>

For example 

    perl mul11txt.pl $ECI_ROOT/data/eci4/mul11/original/german/*/*

will get all the german text in this corpus.

-----------
substent.pl
-----------

Used by bin/unix/textonly (q.v.) to extract from a .ent file all the
low-level entity definitions which can usefully by expanded when
viewing just the text of a .eci file.

-----------
post.pl
-----------

A version of sgmls.pl as provided with sgmls, which does its best to
reconstruct a normalised version of the input to sgmls from its
output.  Needs lots of switch settings to be much use -- see
bin/unix/dopost for an example which was used to normalise many ECI/MCI
components.


-----------
transgrk.pl
-----------

Program for use with GRE01 corpus, or more generally for ISO-LATIN-7
coded Greek files.  It produces a transliterated ascii version which
may be easier to read if you do not have ISO-LATIN-7 fonts available.

Usage: perl transgrk.pl [-latex -stressed] <files>

Transliterates <files> and writes them to the standard output.

Options:

 -latex:	Produce latex input with Greek characters (with stresses).

 -stressed:	Produce an ascii transliteration, with apostrophes after 
		stressed (non-capital) letters denoting the stress.

 <default>	Produce ascii transliteration, omitting the stresses.  This
		is easier for Greeks to read than -stressed, but contains
		less information.

The transliteration used is that used by Cosmos News, a regular poster of
news bulletins in the newsgroup soc.culture.greek.  All Greek characters
are represented by ascii letters, and the capital/lowercase version of
each character is represented by an upper/lower case pair in English.

This script repairs certain detectable misuses of meta-6 (Greek A) 
in the ECI corpus:
 
 (a) For apostrophe, after a lowercase letter
 (b) for stress, before (presumably word-initial) capitals (vowels) that
     are followed by a lowercase letter
 
cf. gre01d05.eci, for examples:

 Type (a), [line 2 of first paragraph]
 parei meros sA ayton arxisan tis prospaqeies gia thn anorqwsh twn
              ^

 Type (b), about a page down:
 toy sta xronia 1980-86 me ena meso ryqmo 2,8%. AOson afora th
                                                ^

We cannot use as a word-initial apostrophe, or word-finally after
a capital letter, since these are environments where A could appear.
And one that I cannot correct: [line 1 of first paragraph]

 Meta to telos toy BA Pagkosmioy polemoy oles oi xwres poy eixan
                    ^		(should be B', meaning "second").


-----------
unpackfn.pl
-----------

Utility used by bin/unix/eci and oneeci to parse filenames.
