🤖 AI Summary
This study investigates how large language models may introduce and amplify stereotypical biases when generating personalized climate communication tailored to different age and gender groups. The authors construct a controlled evaluation framework, employing both isolated and context-rich generation settings, to systematically examine the influence of demographic conditioning on model outputs for the first time. Analyzing responses from GPT-4o, Llama-3.3, and Mistral-Large 2.1 across lexical content, linguistic style, and persuasive framing, they find that messages targeting men and younger audiences emphasize agency and innovation, whereas those for women and older adults are framed around care and tradition. Contextual prompts significantly exacerbate these biases and increase perceived persuasiveness for younger or male recipients. The work proposes a realistic evaluation paradigm integrating thematic and geographic context, revealing how contextual cues can intensify bias in model-generated text.
📝 Abstract
Large language models (LLMs) are increasingly capable of generating personalized, persuasive text at scale, raising new questions about bias and fairness in automated communication. This paper presents the first systematic analysis of how LLMs behave when tasked with demographic-conditioned targeted messaging. We introduce a controlled evaluation framework using three leading models -- GPT-4o, Llama-3.3, and Mistral-Large 2.1 -- across two generation settings: Standalone Generation, which isolates intrinsic demographic effects, and Context-Rich Generation, which incorporates thematic and regional context to emulate realistic targeting. We evaluate generated messages along three dimensions: lexical content, language style, and persuasive framing. We instantiate this framework on climate communication and find consistent age- and gender-based asymmetries across models: male- and youth-targeted messages emphasize agency, innovation, and assertiveness, while female- and senior-targeted messages stress warmth, care, and tradition. Contextual prompts systematically amplify these disparities, with persuasion scores significantly higher for messages tailored to younger or male audiences. Our findings demonstrate how demographic stereotypes can surface and intensify in LLM-generated targeted communication, underscoring the need for bias-aware generation pipelines and transparent auditing frameworks that explicitly account for demographic conditioning in socially sensitive applications.