Diversity and bias in DBpedia and Wikidata as a challenge for text-analysis tools

Authors

DOI:

https://doi.org/10.5282/o-bib/5894

Keywords:

Diversity, Bias, DBpedia, Wikidata, Automated text analysis, Representation

Abstract

Diversity Searcher is a tool originally developed to help analyse diversity in news media texts. It relies on automated content analysis and thus rests on prior assumptions and depends on certain design choices related to diversity. One such design choice is the external knowledge source(s) used. In this article, we discuss implications that these sources can have on the results of content analysis. We compare two data sources that Diversity Searcher has worked with – DBpedia and Wikidata – with respect to their ontological coverage and diversity, and describe implications for the resulting analyses of text corpora. We describe a case study of the relative over- or  underrepresentation of Belgian political parties between 1990 and 2020. In particular, we found a staggering overrepresentation of the political right in the English-language DBpedia.

References

Abián, D.; Guerra, F.; Martínez-Romanos, J. et al.: Wikidata and DBpedia. A comparative study, in: Szymański, Julian; Velegrakis, Yannis (eds.): Semantic keyword-based search on structured data sources, Cham 2018 (Lecture notes in computer science 10546), pp. 142–154. Online: https://doi.org/10.1007/978-3-319-74497-1_14.

Berendt, Bettina; Karadeniz, Özgür; Mertens, Stefan et al.: Fairness beyond “equal”. The Diversity Searcher as a tool to detect and enhance the representation of socio-political actors in news media, in: Leskovec, Jure; Grobelnik, Marko; Najrok, Marc et al. (eds.): WWW ’21:

Companion proceedings of the Web Conference 2021, Ljubljana, Slovenia, April 19–23, 2021, New York 2021, pp. 202–212. Online: https://doi.org/10.1145/3442442.3452303.

Bourdieu, Pierre: The social space and the genesis of groups, in: Theory and society 14 (6), 1985, pp. 723–744.

Deacon, David; Downey, John; Smith, David, Stanyer et al.: National news media coverage of the 2017 election. Centre for Research in Communication and Culture, Loughborough University, report 4: 5 May – 7 June 2017. Online: https://blog.lboro.ac.uk/crcc/wp-content/uploads/sites/23/2017/06/media-coverage-of-the-2017-general-election-campaign-report-4.pdf, last accessed 11.03.2023.

Karadeniz, Özgür; Berendt, Bettina; Kıyak, Sercan; d’Haenens, Leen; Mertens, Stefan: Political representation bias in DBpedia and Wikidata as a challenge for downstream processing, in: arxiv.org, 2022 (CoRR abs/2301.00671). Online: https://doi.org/10.48550/arXiv.2301.00671.

Kish, Ilona; Thominet, Hannah; Zignani, Tiana: Libraries on the European agenda. How can the EU leverage the potential of public libraries to tackle European challenges? Berlin 2021. Online: https://futurium.ec.europa.eu/sites/default/files/2021-05/ACTION%205%20-%20Libraries%20on%20the%20European%20Agenda%20-%20PL%202030_Final_0.pdf, last

accessed 11.03.2023.

Kitchin, Rob: The data revolution. Big data, open data, data infrastructures & their consequences, London 2014.

Leorke, Dale; Wyatt, Danielle; McQuire, Scott: “More than just a library”. Public libraries in the ‘smart city’, in: City, culture and society 15 (12), 2018, pp. 37–44. Online: https://doi.org/10.1016/j.ccs.2018.05.002.

Piscopo, Alessandro; Simperl, Elena: What we talk about when we talk about Wikidata quality: a literature survey, in: OpenSym ‘19: Proceedings of the 15th International Symposium on Open Collaboration, Skövde, Sweden, August 20–22, 2019, New York 2019, pp. 1–11. Online: https://doi.org/10.1145/3306446.3340822.

Ranaivoson, Heritiana: Measuring cultural diversity with the Stirling model, in: New techniques and technologies for statistics, 2013. Online: https://ec.europa.eu/eurostat/cros/content/measuring-cultural-diversity-stirling-model-heritiana-ranaivoson_en, last accessed 11.03.2023.

Scott, Mark: Despite cries of censorship, conservatives dominate social media, Politico, 26.10.2020, https://www.politico.com/news/2020/10/26/censorship-conservatives-socialmedia-432643, last accessed 11.03.2023.

Stirling, Andy: A general framework for analysing diversity in science, technology and society, in: Interface. Journal of The Royal Society 4 (15), 2007, pp. 707–719. Online: https://doi.org/10.1098/rsif.2007.0213.

Published

2023-05-16

Issue

Section

Full papers

How to Cite

Diversity and bias in DBpedia and Wikidata as a challenge for text-analysis tools. (2023). O-Bib. Das Offene Bibliotheksjournal Herausgeber VDB, 10(2), 1-12. https://doi.org/10.5282/o-bib/5894