Evaluation von Volltextdaten mit Open-Source-Komponenten
Optical character recognition, Full text, Evaluation, Historical newspapers, Newspaper, DigitizationAbstract
In the area of full text recognition, several fully-fledged open source systems are available today. Established open source tools stemming from the fields of Data Science (DS), Information Retrieval (IR) and Natural Language Processing (NLP) can also be used to evaluate the results. After a brief discussion of common evaluation procedures and metrics, the application of such tools in the DFG-funded project „Digitisaion of historical German newspapers I (Digitalisierung Historischer Deutscher Zeitungen I)“ at the University and State Library Saxony-Anhalt is used as an example.
