Large Language Models Require Curated Context for Reliable Political Fact-Checking -- Even with Reasoning and Web Search

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Large language models (LLMs) exhibit insufficient reliability in political fact-checking, even when augmented with chain-of-thought reasoning and general web search. To address this, we propose a retrieval-augmented generation (RAG) framework that replaces broad web retrieval with high-quality, human-curated knowledge—specifically, PolitiFact’s verified fact-check summaries—thereby systematically enhancing LLMs’ capacity to assess political claims. Experiments on over 6,000 real-world political statements demonstrate that our curated RAG approach substantially improves fact-checking performance across major LLM variants, yielding an average 233% gain in macro-F1 over baseline methods. This work provides the first empirical validation that knowledge quality—not retrieval breadth—is critical for political fact-checking, establishing a new paradigm for developing high-credibility automated verification systems.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have raised hopes for automated end-to-end fact-checking, but prior studies report mixed results. As mainstream chatbots increasingly ship with reasoning capabilities and web search tools -- and millions of users already rely on them for verification -- rigorous evaluation is urgent. We evaluate 15 recent LLMs from OpenAI, Google, Meta, and DeepSeek on more than 6,000 claims fact-checked by PolitiFact, comparing standard models with reasoning- and web-search variants. Standard models perform poorly, reasoning offers minimal benefits, and web search provides only moderate gains, despite fact-checks being available on the web. In contrast, a curated RAG system using PolitiFact summaries improved macro F1 by 233% on average across model variants. These findings suggest that giving models access to curated high-quality context is a promising path for automated fact-checking.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM effectiveness for political fact-checking with reasoning and web search

Assessing performance gaps between standard models and enhanced variants on factual claims

Investigating curated context solutions for reliable automated fact verification systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using curated RAG system for fact-checking

Providing high-quality context to improve accuracy

Enhancing reliability with PolitiFact summaries

🔎 Similar Papers

No similar papers found.