Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language

📅 2024-06-25
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically evaluates large language models’ (LLMs) capacity to generate persuasive language. To this end, we introduce Persuasive-Pairs—the first cross-domain benchmark comprising expert-annotated text pairs with relative persuasiveness scores across diverse domains. We formulate two novel tasks: explicit rewriting (to enhance or diminish persuasiveness) and neutral paraphrasing. Crucially, our experiments reveal that role-based prompting—commonly used in system instructions—induces an unconscious persuasive bias in LLMs, even during ostensibly neutral paraphrasing. Leveraging outputs from models including LLaMA3, we train a regression-based persuasiveness scoring model to quantitatively assess both model capabilities and prompting strategies. Our key contributions are: (1) the first generalizable, multi-domain persuasiveness benchmark; (2) empirical identification of the implicit persuasive modulation effect of role prompting; and (3) a new evaluation paradigm—with empirical grounding—for controllable and trustworthy language generation.

Technology Category

Application Category

📝 Abstract
We are exposed to much information trying to influence us, such as teaser messages, debates, politically framed news, and propaganda - all of which use persuasive language. With the recent interest in Large Language Models (LLMs), we study the ability of LLMs to produce persuasive text. As opposed to prior work which focuses on particular domains or types of persuasion, we conduct a general study across various domains to measure and benchmark to what degree LLMs produce persuasive language - both when explicitly instructed to rewrite text to be more or less persuasive and when only instructed to paraphrase. We construct the new dataset Persuasive-Pairs of pairs of a short text and its rewrite by an LLM to amplify or diminish persuasive language. We multi-annotate the pairs on a relative scale for persuasive language: a valuable resource in itself, and for training a regression model to score and benchmark persuasive language, including for new LLMs across domains. In our analysis, we find that different 'personas' in LLaMA3's system prompt change persuasive language substantially, even when only instructed to paraphrase.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking LLMs' persuasive text generation
Assessing persuasive language across diverse domains
Analyzing persona impact on LLM persuasion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking LLMs' persuasive text
Creating Persuasive-Pairs dataset
Analyzing persona impact on persuasion
🔎 Similar Papers
No similar papers found.