This directory contains 'disks' 7-13 Definitely the worst of the lot, with formatting codes still there, lots of non-printing characters and a lot of garbage intruded, the latter replaced with , where nnn is the number of rubbish characters deleted. I've used the presence of characters in the range 000-037 as the basis for locating rubbish, so there are probably some short sequences of same, which lack any of those characters, which I've missed. There are LOTS of duplicate articles. I haven't even tried to clean this up. Some suggestions: * is used to introduce formatting keywords, but also, irritatingly, as ñ, which does not occur. È is EOL À is some sort of formatting code  is almost certainly ü ¹ is almost certainly à ¼ is < or >, depending! ­(n-dash) is almost certainly í ¦ is some sort of quote, or possibly a ¡! synthesis. @ is almost certainly ¿ Ø introduces formatting? â is almost certainly é I'm not sure about å, ç or ê -- in some cases they're correct in French quotes. ð is like ¶ in the fc files -- when followed by " it's probably í, but there are some other, less frequent cases as well. The file char-hist is just that -- a histogram of character frequencies in this sub-corpus.