Multimodal Fact-Checking: An Agent-based Approach

📅 2025-12-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automated fact-checking systems struggle to handle multimodal misinformation in real-world social media contexts, primarily due to the absence of dedicated datasets featuring verifiable evidence, human-level reasoning chains, and fully annotated multimodal instances. Method: We introduce RW-Post—the first real-world-driven, evidence-traceable, and reasoning-explainable multimodal fact-checking dataset—and propose AgentFact, a multi-agent framework that performs evidence-driven structured verification via iterative search-filter-reason phases involving five specialized agents. Our approach integrates LLM-assisted evidence extraction, task-aware evidence filtering, vision-language joint analysis, and strategy-guided explanation generation. Contribution/Results: Experiments demonstrate significant improvements in both verification accuracy and interpretability. Results validate that modeling real-world contextual complexity and enforcing structured, multi-step reasoning are critical for trustworthy multimodal fact-checking.

Technology Category

Application Category

📝 Abstract
The rapid spread of multimodal misinformation poses a growing challenge for automated fact-checking systems. Existing approaches, including large vision language models (LVLMs) and deep multimodal fusion methods, often fall short due to limited reasoning and shallow evidence utilization. A key bottleneck is the lack of dedicated datasets that provide complete real-world multimodal misinformation instances accompanied by annotated reasoning processes and verifiable evidence. To address this limitation, we introduce RW-Post, a high-quality and explainable dataset for real-world multimodal fact-checking. RW-Post aligns real-world multimodal claims with their original social media posts, preserving the rich contextual information in which the claims are made. In addition, the dataset includes detailed reasoning and explicitly linked evidence, which are derived from human written fact-checking articles via a large language model assisted extraction pipeline, enabling comprehensive verification and explanation. Building upon RW-Post, we propose AgentFact, an agent-based multimodal fact-checking framework designed to emulate the human verification workflow. AgentFact consists of five specialized agents that collaboratively handle key fact-checking subtasks, including strategy planning, high-quality evidence retrieval, visual analysis, reasoning, and explanation generation. These agents are orchestrated through an iterative workflow that alternates between evidence searching and task-aware evidence filtering and reasoning, facilitating strategic decision-making and systematic evidence analysis. Extensive experimental results demonstrate that the synergy between RW-Post and AgentFact substantially improves both the accuracy and interpretability of multimodal fact-checking.
Problem

Research questions and friction points this paper is trying to address.

Addresses multimodal misinformation spread with limited reasoning in existing systems
Introduces RW-Post dataset for real-world multimodal fact-checking with contextual evidence
Proposes AgentFact framework using specialized agents to emulate human verification workflow
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset RW-Post with contextual multimodal claims and evidence
AgentFact framework with five specialized collaborative agents
Iterative workflow for evidence search, filtering, and reasoning
🔎 Similar Papers
No similar papers found.
Danni Xu
Danni Xu
NUS
misinformation LMMs
Shaojing Fan
Shaojing Fan
Department of Electrical and Computer Engineering, National University of Singapore
Cognitive VisionComputer VisionExperimental Psychology
X
Xuanang Cheng
School of Computing (SoC), National University of Singapore (NUS), Singapore
M
Mohan Kankanhalli
School of Computing (SoC), National University of Singapore (NUS), Singapore