Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks

📅 2026-01-21

📈 Citations: 2

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This study addresses the vulnerability of existing fake news detection models to adversarial sentiment manipulation, which undermines prediction stability. The work proposes AdSent, the first systematic framework leveraging large language models (LLMs) to generate sentiment-controllable adversarial examples for fake news detection. AdSent introduces a sentiment-agnostic training strategy to build detectors robust to sentiment shifts, effectively mitigating the inherent bias toward neutral sentiment prevalent in current models. Evaluated on three benchmark datasets, the proposed method significantly outperforms state-of-the-art approaches, demonstrating marked improvements in prediction consistency and accuracy on both original and sentiment-manipulated news, as well as enhanced cross-dataset generalization capability.

Technology Category

Application Category

📝 Abstract

Misinformation and fake news have become a pressing societal challenge, driving the need for reliable automated detection methods. Prior research has highlighted sentiment as an important signal in fake news detection, either by analyzing which sentiments are associated with fake news or by using sentiment and emotion features for classification. However, this poses a vulnerability since adversaries can manipulate sentiment to evade detectors especially with the advent of large language models (LLMs). A few studies have explored adversarial samples generated by LLMs, but they mainly focus on stylistic features such as writing style of news publishers. Thus, the crucial vulnerability of sentiment manipulation remains largely unexplored. In this paper, we investigate the robustness of state-of-the-art fake news detectors under sentiment manipulation. We introduce AdSent, a sentiment-robust detection framework designed to ensure consistent veracity predictions across both original and sentiment-altered news articles. Specifically, we (1) propose controlled sentiment-based adversarial attacks using LLMs, (2) analyze the impact of sentiment shifts on detection performance. We show that changing the sentiment heavily impacts the performance of fake news detection models, indicating biases towards neutral articles being real, while non-neutral articles are often classified as fake content. (3) We introduce a novel sentiment-agnostic training strategy that enhances robustness against such perturbations. Extensive experiments on three benchmark datasets demonstrate that AdSent significantly outperforms competitive baselines in both accuracy and robustness, while also generalizing effectively to unseen datasets and adversarial scenarios.

Problem

Research questions and friction points this paper is trying to address.

Fake News Detection

Adversarial Attacks

Sentiment Manipulation

Large Language Models

Robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial sentiment attacks

large language models

fake news detection