🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit gender-stereotyped linguistic differences when generating persuasive texts tailored to recipients of different genders. Drawing on theories from social psychology and communication, we developed an evaluation framework that employs paired prompting instructions to test 13 prominent LLMs across 16 languages. Using an LLM-as-judge approach, we automatically assessed 19 categories of persuasive linguistic features. Our analysis reveals, for the first time, pervasive gender-linked stereotypical language patterns in LLM-generated persuasive content across multilingual contexts, closely aligning with established sociolinguistic theories. These findings underscore the risk that current models systematically reproduce societal biases when operating in cross-cultural settings.
📝 Abstract
Large language models (LLMs) are increasingly used for everyday communication tasks, including drafting interpersonal messages intended to influence and persuade. Prior work has shown that LLMs can successfully persuade humans and amplify persuasive language. It is therefore essential to understand how user instructions affect the generation of persuasive language, and to understand whether the generated persuasive language differs, for example, when targeting different groups. In this work, we propose a framework for evaluating how persuasive language generation is affected by recipient gender, sender intent, or output language. We evaluate 13 LLMs and 16 languages using pairwise prompt instructions. We evaluate model responses on 19 categories of persuasive language using an LLM-as-judge setup grounded in social psychology and communication science. Our results reveal significant gender differences in the persuasive language generated across all models. These patterns reflect biases consistent with gender-stereotypical linguistic tendencies documented in social psychology and sociolinguistics.