LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems

📅 2026-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel adversarial persuasion attack on automated fact-checking systems, demonstrating that such systems are vulnerable to evasion through strategic manipulation of claims. For the first time, persuasive techniques are integrated into an adversarial attack framework, leveraging large language models to rewrite false claims using 15 distinct strategies across six categories of persuasion. A decoupled evaluation methodology is employed to systematically assess the impact of these attacks on both fact-checking accuracy and evidence retrieval performance. Experiments on the FEVER and FEVEROUS benchmarks reveal that the proposed adversarial persuasion significantly degrades system effectiveness, highlighting the potent capacity of rhetorical strategies to mislead automated fact-checkers. These findings underscore the need to account for persuasive manipulation in the design of more robust fact-checking systems.

Technology Category

Application Category

📝 Abstract
Automated fact-checking (AFC) systems are susceptible to adversarial attacks, enabling false claims to evade detection. Existing adversarial frameworks typically rely on injecting noise or altering semantics, yet no existing framework exploits the adversarial potential of persuasion techniques, which are widely used in disinformation campaigns to manipulate audiences. In this paper, we introduce a novel class of persuasive adversarial attacks on AFCs by employing a generative LLM to rephrase claims using persuasion techniques. Considering 15 techniques grouped into 6 categories, we study the effects of persuasion on both claim verification and evidence retrieval using a decoupled evaluation strategy. Experiments on the FEVER and FEVEROUS benchmarks show that persuasion attacks can substantially degrade both verification performance and evidence retrieval. Our analysis identifies persuasion techniques as a potent class of adversarial attacks, highlighting the need for more robust AFC systems.
Problem

Research questions and friction points this paper is trying to address.

adversarial attacks
fact-checking systems
persuasion techniques
LLM
disinformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

persuasive adversarial attacks
large language models
automated fact-checking
disinformation
evidence retrieval
🔎 Similar Papers
No similar papers found.