Wenn Algorithmen Zeitschriften lesen

Vom Mehrwert automatisierter Textanreicherung

Authors

DOI:

https://doi.org/10.5282/o-bib/2018H4S181-192

Keywords:

Bibliothekswesen, Computerlinguistik, automatisierte Textanreicherung, Named Entity Recognition (NER), Named Entity Linking (NEL), OCR-Optimierung

Abstract

In cooperation with the Institute of Computational Linguistics at the University of Zurich (ICL UZH), the ETH Library Zurich carried out a pilot project in the field of automated text enrichment. The basis for the pilot were full text files from E-Periodica, the online platform for digitised Swiss journals. Based on a selected corpus of this OCR data and using automated procedures, tests were performed in the areas of OCR correction, recognition of person, place and country names as well as linking identified persons to the German common authority file for libraries (GND). Overall, very positive results were achieved. The system used now serves as a basis for the further expansion of the ETH Library’s competence in this field. The entire content of the E-Periodica platform is to be automatically enhanced and extended with new functionalities. The aim is to offer researchers added value in information gathering. In this article, project content, methodology and results are presented and the next steps are outlined.

Author Biographies

  • Michael Gasser, ETH Zürich, ETH-Bibliothek

    Leitung Archive

  • Regina Wanger, ETH Zürich, ETH-Bibliothek

    Leitung DigiCenter

References

- von Däniken, Pius; Cieliebak, Mark: Transfer Learning and Sentence Level Features for Named Entity Recognition on Tweets, in: The Association for Computational Linguistics (Hg.), Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark, September 7, 2017, S. 166–171. Online: http://www.aclweb.org/anthology/W17-4422, Stand: 24.09.2018.

- Ebling, S; Sennrich, R; Klaper, D; Volk, Martin: Digging for names in the mountains: Combined person name recognition and reference resolution for German alpine texts, in: 5th Language & Technology Conference, Poznan, Poland, 25 November 2011 - 27 November 2011. Online: https://doi.org/10.5167/uzh-50451.

- ETH-Bibliothek Zürich (Hg.): ETH-Bibliothek Jahresbericht 2016, Zürich 2017. Online: https://doi.org/10.3929/ethz-a-004157606.

- Schmid, Helmut: TreeTagger – a part-of-speech tagger for many languages, http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/, Stand: 24.09.2018.

Published

2018-12-10

Issue

Section

Conference proceedings

How to Cite

Wenn Algorithmen Zeitschriften lesen: Vom Mehrwert automatisierter Textanreicherung. (2018). O-Bib. Das Offene Bibliotheksjournal Herausgeber VDB, 5(4), 181-192. https://doi.org/10.5282/o-bib/2018H4S181-192