Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Current LLM-based autonomous fact-checking systems exhibit security vulnerabilities, rendering them susceptible to targeted poisoning attacks: adversaries can exploit the systems’ generated explanatory evidence to fabricate malicious supporting claims, thereby corrupting sub-claim verification and amplifying misinformation. This paper introduces Fact2Fiction, a novel poisoning attack framework specifically designed for agent-based fact-checking systems with a decomposition–verification–aggregation architecture. Fact2Fiction leverages prompt engineering and evidence fabrication techniques to construct stealthy adversarial inputs, strategically incorporating the system’s own self-generated reasoning justifications. Experiments across multiple poisoning budgets demonstrate that Fact2Fiction achieves an 8.9%–21.2% higher attack success rate than state-of-the-art methods. Crucially, it is the first work to expose structural vulnerabilities at the reasoning-chain level in such systems, providing both a critical warning and a foundational benchmark for future research on robustness and defense.

Technology Category

Application Category

📝 Abstract

State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation. This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9%--21.2% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.

Problem

Research questions and friction points this paper is trying to address.

Targeted poisoning attack on agentic fact-checking systems

Exploiting system-generated justifications to compromise verification

Exposing security weaknesses in autonomous LLM-based fact-checkers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Poisoning attack framework targets agentic fact-checking systems

Exploits system-generated justifications to craft malicious evidence

Mirrors decomposition strategy to compromise sub-claim verification

🔎 Similar Papers

No similar papers found.