Wenn Algorithmen Zeitschriften lesen
Vom Mehrwert automatisierter Textanreicherung
DOI:
https://doi.org/10.5282/o-bib/2018H4S181-192Keywords:
Bibliothekswesen, Computerlinguistik, automatisierte Textanreicherung, Named Entity Recognition (NER), Named Entity Linking (NEL), OCR-OptimierungAbstract
In cooperation with the Institute of Computational Linguistics at the University of Zurich (ICL UZH), the ETH Library Zurich carried out a pilot project in the field of automated text enrichment. The basis for the pilot were full text files from E-Periodica, the online platform for digitised Swiss journals. Based on a selected corpus of this OCR data and using automated procedures, tests were performed in the areas of OCR correction, recognition of person, place and country names as well as linking identified persons to the German common authority file for libraries (GND). Overall, very positive results were achieved. The system used now serves as a basis for the further expansion of the ETH Library’s competence in this field. The entire content of the E-Periodica platform is to be automatically enhanced and extended with new functionalities. The aim is to offer researchers added value in information gathering. In this article, project content, methodology and results are presented and the next steps are outlined.
References
- von Däniken, Pius; Cieliebak, Mark: Transfer Learning and Sentence Level Features for Named Entity Recognition on Tweets, in: The Association for Computational Linguistics (Hg.), Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark, September 7, 2017, S. 166–171. Online: http://www.aclweb.org/anthology/W17-4422, Stand: 24.09.2018.
- Ebling, S; Sennrich, R; Klaper, D; Volk, Martin: Digging for names in the mountains: Combined person name recognition and reference resolution for German alpine texts, in: 5th Language & Technology Conference, Poznan, Poland, 25 November 2011 - 27 November 2011. Online: https://doi.org/10.5167/uzh-50451.
- ETH-Bibliothek Zürich (Hg.): ETH-Bibliothek Jahresbericht 2016, Zürich 2017. Online: https://doi.org/10.3929/ethz-a-004157606.
- Schmid, Helmut: TreeTagger – a part-of-speech tagger for many languages, http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/, Stand: 24.09.2018.
Downloads
Published
Issue
Section
License
Copyright (c) 2018 Michael Gasser, Regina Wanger, Ismail Prada
This work is licensed under a Creative Commons Attribution 4.0 International License.