Scaling Truth: The Confidence Paradox in AI Fact-Checking

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies a “confidence paradox” in large language models (LLMs) for fact-checking: smaller models exhibit high confidence but low accuracy, whereas larger models achieve high accuracy yet display low confidence—leading resource-constrained institutions to rely on less reliable models and thereby perpetuating systemic bias, especially for non-English claims and statements from Global South sources, exacerbating informational inequity. Method: We introduce the first large-scale, multilingual AI fact-checking benchmark—comprising 5,000 real-world claims annotated by 174 professional fact-checking organizations and over 240,000 human-labeled instances—and systematically evaluate nine LLM families under four prompting strategies. Contribution/Results: We uncover, for the first time, a cognitive bias pattern mirroring the Dunning-Kruger effect. Based on these findings, we propose a multilingual, cross-regional fairness evaluation framework, providing empirical grounding and policy-relevant insights to enhance the reliability and inclusivity of AI-powered fact-checking.

Technology Category

Application Category

📝 Abstract
The rise of misinformation underscores the need for scalable and reliable fact-checking solutions. Large language models (LLMs) hold promise in automating fact verification, yet their effectiveness across global contexts remains uncertain. We systematically evaluate nine established LLMs across multiple categories (open/closed-source, multiple sizes, diverse architectures, reasoning-based) using 5,000 claims previously assessed by 174 professional fact-checking organizations across 47 languages. Our methodology tests model generalizability on claims postdating training cutoffs and four prompting strategies mirroring both citizen and professional fact-checker interactions, with over 240,000 human annotations as ground truth. Findings reveal a concerning pattern resembling the Dunning-Kruger effect: smaller, accessible models show high confidence despite lower accuracy, while larger models demonstrate higher accuracy but lower confidence. This risks systemic bias in information verification, as resource-constrained organizations typically use smaller models. Performance gaps are most pronounced for non-English languages and claims originating from the Global South, threatening to widen existing information inequalities. These results establish a multilingual benchmark for future research and provide an evidence base for policy aimed at ensuring equitable access to trustworthy, AI-assisted fact-checking.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM effectiveness in global fact-checking contexts
Assessing model confidence-accuracy mismatch in misinformation detection
Identifying performance gaps in non-English and Global South claims
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic evaluation of nine LLMs
Multilingual benchmark with human annotations
Testing model generalizability and prompting strategies
🔎 Similar Papers
No similar papers found.
I
Ihsan A. Qazi
Department of Computer Science, Lahore University of Management Sciences, Lahore, 54792, Pakistan
Z
Zohaib Khan
Department of Computer Science, Lahore University of Management Sciences, Lahore, 54792, Pakistan
A
Abdullah Ghani
Department of Computer Science, Lahore University of Management Sciences, Lahore, 54792, Pakistan
A
Agha A. Raza
Department of Computer Science, Lahore University of Management Sciences, Lahore, 54792, Pakistan
Z
Zafar A. Qazi
Department of Computer Science, Lahore University of Management Sciences, Lahore, 54792, Pakistan
Wassay Sajjad
Wassay Sajjad
Research Associate, LUMS
A
Ayesha Ali
Department of Economics, Lahore University of Management Sciences, Lahore, 54792, Pakistan
A
Asher Javaid
Department of Computer Science, Lahore University of Management Sciences, Lahore, 54792, Pakistan
M
Muhammad Abdullah Sohail
Department of Computer Science, Lahore University of Management Sciences, Lahore, 54792, Pakistan
A
Abdul H. Azeemi
Department of Computer Science, Lahore University of Management Sciences, Lahore, 54792, Pakistan