Persuasion and Safety in the Era of Generative AI

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the AI safety challenge of distinguishing rational persuasion from cognitive manipulation—a critical concern amid growing risks of LLM-driven manipulative behavior. Drawing on cognitive science and rhetoric theory, we propose the first fine-grained, operationally defined taxonomy of persuasive techniques. Through iterative expert annotation and inter-annotator agreement validation, we release the first high-quality, human-annotated dataset specifically designed for manipulation detection. Using zero-shot and few-shot prompting paradigms, we systematically evaluate state-of-the-art LLMs’ discrimination capabilities. Empirical results reveal significant deficiencies in current models’ ability to detect covert manipulation tactics—particularly emotional hijacking and false dilemmas. Our study bridges a key empirical gap in AI ethics by establishing a measurable, evaluable, and governable foundation for manipulation assessment. The dataset and methodology provide essential benchmarks for regulatory compliance (e.g., EU AI Act), alignment evaluation, and the development of oversight tools.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) achieve advanced persuasive capabilities, concerns about their potential risks have grown. The EU AI Act prohibits AI systems that use manipulative or deceptive techniques to undermine informed decision-making, highlighting the need to distinguish between rational persuasion, which engages reason, and manipulation, which exploits cognitive biases. My dissertation addresses the lack of empirical studies in this area by developing a taxonomy of persuasive techniques, creating a human-annotated dataset, and evaluating LLMs' ability to distinguish between these methods. This work contributes to AI safety by providing resources to mitigate the risks of persuasive AI and fostering discussions on ethical persuasion in the age of generative AI.
Problem

Research questions and friction points this paper is trying to address.

Distinguishing rational persuasion from manipulative AI techniques
Developing taxonomy and dataset for AI persuasive methods
Evaluating LLMs' ability to identify ethical vs unethical persuasion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed taxonomy of persuasive techniques
Created human-annotated dataset for evaluation
Assessed LLMs' ability to distinguish methods
🔎 Similar Papers
No similar papers found.