Michigan Tradesman Dataset
This dataset is publicly available.
The full text, PDFs, and metadata available as downloads above are derived from the The Michigan Tradesman collection available via the MSU Libraries' digital repository.
One zip file contains the full plain text text; another (much larger) file with all PDFs; and a third containing metadata in two formats: MODS and Dublin Core. The MODS data is more complete and acts as the primary record of each newspaper issue, while the Dublin Core data is less complete but also less hierarchical and perhaps easier to read or parse. The Dublin Core metadata has also been converted to csv.
The text, which was produced by OCR using tesseract, is uncorrected and of widely varying quality. Metadata has been applied at the issue (not article) level.
If you have any questions or suggestions concerning this data, please send them to the Digital Scholarship Lab.