When Agents Persuade: Rhetoric Generation and Mitigation in LLMs

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study addresses the vulnerability of large language models (LLMs) to generating rhetorically manipulative propaganda content in open-ended interactions. It presents the first systematic evaluation of LLMs’ capacity to produce such content, introducing a novel assessment framework that integrates a propaganda text classifier with a rhetorical device detection model. The work comparatively analyzes the effectiveness of prominent alignment techniques—including supervised fine-tuning (SFT), direct preference optimization (DPO), and orthogonal preference optimization (ORPO)—in mitigating rhetorical manipulation. Experimental results demonstrate that ORPO achieves superior performance in suppressing the generation of propagandistic outputs, significantly reducing the model’s propensity to produce content exhibiting manipulative rhetoric. These findings substantiate ORPO’s efficacy as an alignment strategy for enhancing the safety and reliability of LLMs in adversarial or unstructured settings.

Technology Category

Application Category

📝 Abstract

Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings show that, when prompted, LLMs exhibit propagandistic behaviors and use a variety of rhetorical techniques in doing so. We also explore mitigation via Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization). We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective.

Problem

Research questions and friction points this paper is trying to address.

propaganda

rhetoric

LLMs

manipulative content

persuasion

Innovation

Methods, ideas, or system contributions that make the work stand out.

propaganda detection

rhetorical techniques

LLM alignment