How Grounded is Wikipedia? A Study on Structured Evidential Support

📅 2025-06-14

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study addresses the factual verifiability of Wikipedia content—specifically, whether statements are supported by traceable, publicly accessible citations. To this end, we introduce PeopleProfiles, a fine-grained annotated dataset that systematically reveals a severe evidence disconnect between lead sections and main text (>80% of lead statements lack corresponding in-text citations). We propose a multi-level support annotation framework integrating human-constructed structured evidence chains, cross-paragraph provenance modeling, verification of citation source accessibility, and standardized retrieval benchmarking. Key findings include: (i) ~20% of lead statements receive no in-text support; (ii) 27% of in-text annotated statements lack verifiable, publicly accessible references; and (iii) state-of-the-art retrieval methods fail to recover complex grounding evidence. This work establishes a new benchmark and methodological foundation for Wikipedia credibility assessment and automated fact-checking.

Technology Category

Application Category

📝 Abstract

Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work provides a quantitative analysis of the extent to which Wikipedia *is* so grounded and of how readily grounding evidence may be retrieved. To this end, we introduce PeopleProfiles -- a large-scale, multi-level dataset of claim support annotations on Wikipedia articles of notable people. We show that roughly 20% of claims in Wikipedia *lead* sections are unsupported by the article body; roughly 27% of annotated claims in the article *body* are unsupported by their (publicly accessible) cited sources; and>80% of lead claims cannot be traced to these sources via annotated body evidence. Further, we show that recovery of complex grounding evidence for claims that *are* supported remains a challenge for standard retrieval methods.

Problem

Research questions and friction points this paper is trying to address.

Analyzes Wikipedia's grounding in cited sources

Measures unsupported claims in lead and body sections

Evaluates retrieval challenges for grounding evidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset for claim support annotations

Quantitative analysis of Wikipedia's source grounding

Evaluation of retrieval methods for evidence recovery

🔎 Similar Papers

No similar papers found.