Appendix 1: General flowchart of Tag Assignment Program
Notes
- If the word ends in "s apostrophe" then strip the apostrophe; if the word ends in "apostrophe " then strip both characters (and any preceding full-stop).
- "Non-words" are the following:
a letter followed by zero or more digits (0 to 9), possibly followed by a single,
double, or triple prime, tagged ZZ
a number* followed by "st", "nd", "rd" or "th", tagged OD
a number followed by "s" tagged CDS
a number containing "-", tagged CD-CD
a number followed by "apostrophe s", tagged CD$
a number followed (possibly) by a letter, tagged CD
a word containing a superscript or subscript, tagged &FO
a word containing letters and digits, but no hyphen, tagged &FO
*In this context, a "number" means a sequence of digits (0-9) perhaps also including ".", "," and "/INDEX.HTM".
- The "standard" prefixes include "a-","co-", "counter-", "de-", "hyper-", "mis-", "out-", "over-", "re-", "retro-", "super-" , and "trans-".
- Words ending "ches", "shes", "sses", "zzes", "oes", "xes" have the "es" removed: words with or more letters and ending in "ies" have the "ies" changed to "y"; words ending in "full-stop s" have both characters removed; other words ending in "s" (unless they end in "ss") have it removed.
- Tags that take -s are VB (becoming VBZ) and CD, NN, NNP, NNU, NP NPL, NPT, NR (becoming CDS, NNS, NNPS, NNUS, NPS, NPLS, NPTS, NRS).