The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

πŸ“… 2026-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

228K/year
πŸ€– AI Summary
This study addresses the phenomenon of β€œrole collapse” in large language models (LLMs) within multi-agent simulations, where distinct character roles converge toward homogeneous behaviors, undermining behavioral diversity. The work introduces this concept for the first time and proposes a quantitative evaluation framework encompassing coverage, uniformity, and complexity. Leveraging the BFI-44 personality inventory, moral reasoning tasks, and self-introduction generation, the authors systematically validate the prevalence of role collapse across ten mainstream LLMs. Findings reveal that behavioral differences among generated agents stem predominantly from coarse-grained demographic stereotypes rather than individualized traits, and that higher-fidelity generation often exacerbates stereotyping. The project releases an open-source evaluation toolkit and dataset to establish benchmarks and diagnostic tools for enhancing role diversity in LLM-driven agents.

Technology Category

Application Category

πŸ“ Abstract
Applications based on large language models (LLMs), such as multi-agent simulations, require population diversity among agents. We identify a pervasive failure mode we term \emph{Persona Collapse}: agents each assigned a distinct profile nonetheless converge into a narrow behavioral mode, producing a homogeneous simulated population. To quantify persona collapse, we propose a framework that measures how much of the persona space a population occupies (Coverage), how evenly agents spread across it (Uniformity), and how rich the resulting behavioral patterns are (Complexity). Evaluating ten LLMs on personality simulation (BFI-44), moral reasoning, and self-introduction, we observe persona collapse along two axes: (1) Dimensions: a model can appear diverse on one axis yet structurally degenerate on another, and (2) Domains: the same model may collapse the most in personality yet be the most diverse in moral reasoning. Furthermore, item-level diagnostics reveal that behavioral variation tracks coarse demographic stereotypes rather than the fine-grained individual differences specified in each persona. Counter-intuitively, \textbf{the models achieving the highest per-persona fidelity consistently produce the most stereotyped populations}. We release our toolkit and data to support population-level evaluation of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Persona Collapse
Large Language Models
Population Diversity
Behavioral Homogenization
Multi-agent Simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Persona Collapse
Population Diversity
Large Language Models
Behavioral Homogenization
Stereotype Amplification
πŸ”Ž Similar Papers
No similar papers found.