The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study addresses the phenomenon of “role collapse” in large language models (LLMs) within multi-agent simulations, where distinct character roles converge toward homogeneous behaviors, undermining behavioral diversity. The work introduces this concept for the first time and proposes a quantitative evaluation framework encompassing coverage, uniformity, and complexity. Leveraging the BFI-44 personality inventory, moral reasoning tasks, and self-introduction generation, the authors systematically validate the prevalence of role collapse across ten mainstream LLMs. Findings reveal that behavioral differences among generated agents stem predominantly from coarse-grained demographic stereotypes rather than individualized traits, and that higher-fidelity generation often exacerbates stereotyping. The project releases an open-source evaluation toolkit and dataset to establish benchmarks and diagnostic tools for enhancing role diversity in LLM-driven agents.

Technology Category

Application Category

📝 Abstract

Applications based on large language models (LLMs), such as multi-agent simulations, require population diversity among agents. We identify a pervasive failure mode we term \emph{Persona Collapse}: agents each assigned a distinct profile nonetheless converge into a narrow behavioral mode, producing a homogeneous simulated population. To quantify persona collapse, we propose a framework that measures how much of the persona space a population occupies (Coverage), how evenly agents spread across it (Uniformity), and how rich the resulting behavioral patterns are (Complexity). Evaluating ten LLMs on personality simulation (BFI-44), moral reasoning, and self-introduction, we observe persona collapse along two axes: (1) Dimensions: a model can appear diverse on one axis yet structurally degenerate on another, and (2) Domains: the same model may collapse the most in personality yet be the most diverse in moral reasoning. Furthermore, item-level diagnostics reveal that behavioral variation tracks coarse demographic stereotypes rather than the fine-grained individual differences specified in each persona. Counter-intuitively, \textbf{the models achieving the highest per-persona fidelity consistently produce the most stereotyped populations}. We release our toolkit and data to support population-level evaluation of LLMs.

Problem

Research questions and friction points this paper is trying to address.

Persona Collapse

Large Language Models

Population Diversity

Behavioral Homogenization

Multi-agent Simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Persona Collapse

Population Diversity

Large Language Models