Wenn Algorithmen Zeitschriften lesen

Vom Mehrwert automatisierter Textanreicherung





Bibliothekswesen, Computerlinguistik, automatisierte Textanreicherung, Named Entity Recognition (NER), Named Entity Linking (NEL), OCR-Optimierung


In cooperation with the Institute of Computational Linguistics at the University of Zurich (ICL UZH), the ETH Library Zurich carried out a pilot project in the field of automated text enrichment. The basis for the pilot were full text files from E-Periodica, the online platform for digitised Swiss journals. Based on a selected corpus of this OCR data and using automated procedures, tests were performed in the areas of OCR correction, recognition of person, place and country names as well as linking identified persons to the German common authority file for libraries (GND). Overall, very positive results were achieved. The system used now serves as a basis for the further expansion of the ETH Library’s competence in this field. The entire content of the E-Periodica platform is to be automatically enhanced and extended with new functionalities. The aim is to offer researchers added value in information gathering. In this article, project content, methodology and results are presented and the next steps are outlined.

Author Biographies

  • Michael Gasser, ETH Zürich, ETH-Bibliothek

    Leitung Archive

  • Regina Wanger, ETH Zürich, ETH-Bibliothek

    Leitung DigiCenter


