Generative Exaggeration in LLM Social Agents: Consistency, Bias, and Toxicity

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the reliability of large language models (LLMs) in simulating political discourse on social media. We construct LLM agents (Gemini, Mistral, DeepSeek) trained on 21 million political interactions from X (formerly Twitter), initialized via zero- and few-shot prompting, and systematically evaluate their consistency in responses, ideological bias, and generation of harmful language. Results reveal that LLMs do not merely imitate user behavior but exhibit “generative exaggeration”—a systematic, optimization-driven distortion wherein they reconstruct rather than reproduce observed user traits. While contextual enhancement improves response consistency, it simultaneously amplifies ideological polarization, stylistic stereotyping, and toxicity. This work provides the first empirical evidence of structural unreliability in LLMs as social agents and introduces “generative exaggeration” as a novel conceptual framework, offering both theoretical grounding and empirical validation for trustworthy evaluation of AI-mediated social simulation.

Technology Category

Application Category

📝 Abstract
We investigate how Large Language Models (LLMs) behave when simulating political discourse on social media. Leveraging 21 million interactions on X during the 2024 U.S. presidential election, we construct LLM agents based on 1,186 real users, prompting them to reply to politically salient tweets under controlled conditions. Agents are initialized either with minimal ideological cues (Zero Shot) or recent tweet history (Few Shot), allowing one-to-one comparisons with human replies. We evaluate three model families (Gemini, Mistral, and DeepSeek) across linguistic style, ideological consistency, and toxicity. We find that richer contextualization improves internal consistency but also amplifies polarization, stylized signals, and harmful language. We observe an emergent distortion that we call "generation exaggeration": a systematic amplification of salient traits beyond empirical baselines. Our analysis shows that LLMs do not emulate users, they reconstruct them. Their outputs, indeed, reflect internal optimization dynamics more than observed behavior, introducing structural biases that compromise their reliability as social proxies. This challenges their use in content moderation, deliberative simulations, and policy modeling.
Problem

Research questions and friction points this paper is trying to address.

Analyze LLM behavior in political discourse simulation
Assess ideological consistency and toxicity in LLM outputs
Identify generation exaggeration and structural biases in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agents simulate political discourse on X
Controlled comparisons with human replies
Analyze linguistic style, consistency, toxicity
🔎 Similar Papers
No similar papers found.