FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing multimodal large language model–based approaches to video misinformation detection, which rely on fixed-depth reasoning and struggle with scenarios involving sparse evidence or requiring external verification. The authors propose an agent-based iterative reasoning framework that formulates fact-checking as a dynamic process: by assessing task ambiguity, the system adaptively invokes external tools to gather critical evidence and iteratively refines its reasoning path. A two-stage training strategy is introduced—combining domain-specific agent supervised fine-tuning with decision-aware reinforcement learning—to enhance tool-calling efficiency and enable risk-sensitive decision calibration. Evaluated on the FakeSV, FakeTT, and FakeVV benchmarks, the method significantly outperforms current state-of-the-art approaches, demonstrating superior robustness and generalization capability.

Technology Category

Application Category

📝 Abstract
Multimodal large language models (MLLMs) have substantially advanced video misinformation detection through unified multimodal reasoning, but they often rely on fixed-depth inference and place excessive trust in internally generated assumptions, particularly in scenarios where critical evidence is sparse, fragmented, or requires external verification. To address these limitations, we propose FactGuard, an agentic framework for video misinformation detection that formulates verification as an iterative reasoning process built upon MLLMs. FactGuard explicitly assesses task ambiguity and selectively invokes external tools to acquire critical evidence, enabling progressive refinement of reasoning trajectories. To further strengthen this capability, we introduce a two-stage training strategy that combines domain-specific agentic supervised fine-tuning with decision-aware reinforcement learning to optimize tool usage and calibrate risk-sensitive decision making. Extensive experiments on FakeSV, FakeTT, and FakeVV demonstrate FactGuard's state-of-the-art performance and validate its excellent robustness and generalization capacity.
Problem

Research questions and friction points this paper is trying to address.

video misinformation detection
multimodal large language models
external verification
reasoning ambiguity
evidence sparsity
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic reasoning
reinforcement learning
multimodal large language models
external tool invocation
video misinformation detection
🔎 Similar Papers
No similar papers found.
Zehao Li
Zehao Li
Peking University
Operations researchStochastic approximation
H
Hongwei Yu
University of Science and Technology Beijing, Beijing, China
H
Hao Jiang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Qiang Sheng
Qiang Sheng
Chinese Academy of Sciences
fake news detectionfact checkingLLM safety
Y
Yilong Xu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Baolong Bi
Baolong Bi
University of Chinese Academy of Sciences
trustworthy large language models
Yang Li
Yang Li
Institute of Automation, Chinese Academy of Sciences
MLLMAgentbrain-inspired intelligenceArtificial intelligence
Z
Zhenlong Yuan
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Yujun Cai
Yujun Cai
NTU → Meta → Lecturer(Assistant Professor) @UQ
Multi-Modal PerceptionVision-Language Models
Z
Zhaoqi Wang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China