Persuasion and Safety in the Era of Generative AI

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the AI safety challenge of distinguishing rational persuasion from cognitive manipulation—a critical concern amid growing risks of LLM-driven manipulative behavior. Drawing on cognitive science and rhetoric theory, we propose the first fine-grained, operationally defined taxonomy of persuasive techniques. Through iterative expert annotation and inter-annotator agreement validation, we release the first high-quality, human-annotated dataset specifically designed for manipulation detection. Using zero-shot and few-shot prompting paradigms, we systematically evaluate state-of-the-art LLMs’ discrimination capabilities. Empirical results reveal significant deficiencies in current models’ ability to detect covert manipulation tactics—particularly emotional hijacking and false dilemmas. Our study bridges a key empirical gap in AI ethics by establishing a measurable, evaluable, and governable foundation for manipulation assessment. The dataset and methodology provide essential benchmarks for regulatory compliance (e.g., EU AI Act), alignment evaluation, and the development of oversight tools.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) achieve advanced persuasive capabilities, concerns about their potential risks have grown. The EU AI Act prohibits AI systems that use manipulative or deceptive techniques to undermine informed decision-making, highlighting the need to distinguish between rational persuasion, which engages reason, and manipulation, which exploits cognitive biases. My dissertation addresses the lack of empirical studies in this area by developing a taxonomy of persuasive techniques, creating a human-annotated dataset, and evaluating LLMs' ability to distinguish between these methods. This work contributes to AI safety by providing resources to mitigate the risks of persuasive AI and fostering discussions on ethical persuasion in the age of generative AI.

Problem

Research questions and friction points this paper is trying to address.

Distinguishing rational persuasion from manipulative AI techniques

Developing taxonomy and dataset for AI persuasive methods

Evaluating LLMs' ability to identify ethical vs unethical persuasion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed taxonomy of persuasive techniques

Created human-annotated dataset for evaluation

Assessed LLMs' ability to distinguish methods

🔎 Similar Papers

Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations