Factual Inconsistencies in Multilingual Wikipedia Tables

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses cross-lingual factual inconsistency in multilingual Wikipedia tables—systematic discrepancies in structured information across language editions on the same topic, undermining knowledge reliability and the robustness of AI systems trained on Wikipedia. We propose the first systematic typology of such inconsistencies and develop a cross-lingual table alignment and consistency evaluation framework that integrates NLP techniques with structural table alignment to enable automated collection, semantic alignment, and discrepancy detection. Quantitative and qualitative analyses on a multilingual benchmark dataset demonstrate that factual inconsistencies are pervasive and exhibit diverse patterns. Our work establishes a novel methodology for consistency verification in multilingual knowledge bases and provides empirical grounding and a scalable technical pathway toward building trustworthy AI systems.

Technology Category

Application Category

📝 Abstract
Wikipedia serves as a globally accessible knowledge source with content in over 300 languages. Despite covering the same topics, the different versions of Wikipedia are written and updated independently. This leads to factual inconsistencies that can impact the neutrality and reliability of the encyclopedia and AI systems, which often rely on Wikipedia as a main training source. This study investigates cross-lingual inconsistencies in Wikipedia's structured content, with a focus on tabular data. We developed a methodology to collect, align, and analyze tables from Wikipedia multilingual articles, defining categories of inconsistency. We apply various quantitative and qualitative metrics to assess multilingual alignment using a sample dataset. These insights have implications for factual verification, multilingual knowledge interaction, and design for reliable AI systems leveraging Wikipedia content.
Problem

Research questions and friction points this paper is trying to address.

Investigates factual inconsistencies in multilingual Wikipedia tables
Develops methodology to analyze cross-lingual table alignment
Assesses impacts on AI reliability and knowledge verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collect align analyze multilingual Wikipedia tables
Define categories of inconsistency in tabular data
Apply quantitative qualitative metrics for alignment
🔎 Similar Papers
No similar papers found.
S
Silvia Cappa
CNR ISTC
L
Lingxiao Kong
Fraunhofer Institute for Applied Information Technology FIT
P
Pille-Riin Peet
Tallinn University of Technology
F
Fanfu Wei
EURECOM
Y
Yuchen Zhou
Technical University of Munich
Jan-Christoph Kalo
Jan-Christoph Kalo
Assistant Professor, INDElab, University of Amsterdam
Knowledge GraphsNatural Language ProcessingKnowledge Integration