Texterkennung als Herausforderung bei der Digitalisierung von Tabellen
Am Beispiel des Projektes für belgische historische Zählungen (KU Leuven Libraries Economics and Business)
Digitalisierung, Texterkennung, OCRAbstract
Censuses have been taking place for more than 5000 years. They were originally carried out for the purposes of tax collection and the military, but later also used for scientific research. The first censuses which, from the start, were also available for research were carried out in Belgium in 1846 under the direction of Adolphe Quetelet. After Quetelet’s famous first censuses for research, many subsequent censuses were taken. Since the analysis of these censuses is very time-consuming due to their extent and format, it makes sense to convert them into digital form by means of electronic text recognition. The KU Leuven Libraries Economics and Business (Belgium) is currently working on a project aimed at offering the printed editions of the Belgian industrial censuses between 1846 and 1947 as Excel spreadsheets. This article addresses the challenges involved and describes the procedures.
Bracke, Nele: Een monument voor het land. Overheidsstatistiek in België 1795-1870, Gent 2008. Online: https://www.oapen.org/search?identifier=366390 (Stand: 29.04.2020).
Černý, Jaroslav: Consanguineous Marriages in Pharaonic Egypt, in: Journal of Egyptian Archeology 40, 1954, S. 28–29. Online: https://www.jstor.org/stable/3855544 (Stand: 29.04.2020).
Clausner, Christian; Antonacopoulos, Apostolos; Henshaw, Christy u.a.: Towards the Extraction of Statistical Information from Digitised Numerical Tables: The Medical Officer of Health Reports Scoping Study, in: DATeCH2019: Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage, Brüssel, 2019, S. 65-71. Online: https://doi.org/10.1145/3322905.3322932 (Stand: 29.04.2020).
Klep, Paul: Politieke strubbelingen rond de volkstelling 1859, Voorburg, 2007. Online: http://www.volkstelling.nl/nl/documentatie/1859/rede_pklep/index.html (Stand: 29.04.2020).
March, Lucien: Le recensement des industries en Belgique en 1896, in: Journal de la société statistique de Paris 43, 1902, S. 257-267.
Project Belgische historische tellingen. Online: https://bib.kuleuven.be/ebib/project-belgische-historische-tellingen/project_bht (Stand: 29.04.2020).
Statistique de la Belgique: Industrie, recensement général (15 octobre 1846). Online: http://resolver.libis.be/IE11452503/representation (Bild) und http://resolver.libis.be/IE13011283/representation (Durchsuchbarer Text + Excel-Kalkulationstabelle)
Tenney, Frank: Roman Census Statistics from 508 to 225 B.C., in: The American Journal of Philology 51 (4), 1930, S. 313–324. Online: https://www.jstor.org/stable/289892 (Stand: 29.04.2020).
Copyright (c) 2020 André Davids
This work is licensed under a Creative Commons Attribution 4.0 International License.