Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-based autonomous fact-checking systems exhibit security vulnerabilities, rendering them susceptible to targeted poisoning attacks: adversaries can exploit the systems’ generated explanatory evidence to fabricate malicious supporting claims, thereby corrupting sub-claim verification and amplifying misinformation. This paper introduces Fact2Fiction, a novel poisoning attack framework specifically designed for agent-based fact-checking systems with a decomposition–verification–aggregation architecture. Fact2Fiction leverages prompt engineering and evidence fabrication techniques to construct stealthy adversarial inputs, strategically incorporating the system’s own self-generated reasoning justifications. Experiments across multiple poisoning budgets demonstrate that Fact2Fiction achieves an 8.9%–21.2% higher attack success rate than state-of-the-art methods. Crucially, it is the first work to expose structural vulnerabilities at the reasoning-chain level in such systems, providing both a critical warning and a foundational benchmark for future research on robustness and defense.

Technology Category

Application Category

📝 Abstract
State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation. This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9%--21.2% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.
Problem

Research questions and friction points this paper is trying to address.

Targeted poisoning attack on agentic fact-checking systems
Exploiting system-generated justifications to compromise verification
Exposing security weaknesses in autonomous LLM-based fact-checkers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Poisoning attack framework targets agentic fact-checking systems
Exploits system-generated justifications to craft malicious evidence
Mirrors decomposition strategy to compromise sub-claim verification
🔎 Similar Papers
No similar papers found.
H
Haorui He
Department of Interactive Media, Hong Kong Baptist University
Y
Yupeng Li
Department of Interactive Media, Hong Kong Baptist University
B
Bin Benjamin Zhu
Microsoft Corporation
D
Dacheng Wen
Department of Interactive Media, Hong Kong Baptist University
Reynold Cheng
Reynold Cheng
ACM Distinguished Member, HKU Computer Science Professor
Data UncertaintyGraph DatabasesData Science for Social Goods
Francis C. M. Lau
Francis C. M. Lau
Honorary Professor, The University of Hong Kong
Computer ScienceComputer SystemsNetworks