Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Current large language model (LLM)-based social simulations often oversimplify populations using an “average personality,” neglecting critical group heterogeneity. This study introduces audience segmentation methods to systematically reconstruct heterogeneity in LLM-driven social simulations, leveraging U.S. climate opinion survey data and employing Llama 3.1-70B and Mixtral 8x22B models to compare theory-driven, data-driven, and scale-driven segmentation strategies. It presents the first systematic evaluation of how the granularity, parsimony, and selection logic of segmentation identifiers affect fidelity across distributional, structural, and predictive dimensions, proposing a multidimensional fidelity-oriented framework for modeling heterogeneity. Findings indicate that moderate-granularity identifiers yield optimal fidelity, with excessive refinement degrading performance; parsimonious configurations generally outperform complex ones; and different selection logics exhibit complementary strengths, suggesting no single optimal approach.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used to simulate social attitudes and behaviors, offering scalable "silicon samples" that can approximate human data. However, current simulation practice often collapses diversity into an "average persona," masking subgroup variation that is central to social reality. This study introduces audience segmentation as a systematic approach for restoring heterogeneity in LLM-based social simulation. Using U.S. climate-opinion survey data, we compare six segmentation configurations across two open-weight LLMs (Llama 3.1-70B and Mixtral 8x22B), varying segmentation identifier granularity, parsimony, and selection logic (theory-driven, data-driven, and instrument-based). We evaluate simulation performance with a three-dimensional evaluation framework covering distributional, structural, and predictive fidelity. Results show that increasing identifier granularity does not produce consistent improvement: moderate enrichment can improve performance, but further expansion does not reliably help and can worsen structural and predictive fidelity. Across parsimony comparisons, compact configurations often match or outperform more comprehensive alternatives, especially in structural and predictive fidelity, while distributional fidelity remains metric dependent. Identifier selection logic determines which fidelity dimension benefits most: instrument-based selection best preserves distributional shape, whereas data-driven selection best recovers between-group structure and identifier-outcome associations. Overall, no single configuration dominates all dimensions, and performance gains in one dimension can coincide with losses in another. These findings position audience segmentation as a core methodological approach for valid LLM-based social simulation and highlight the need for heterogeneity-aware evaluation and variance-preserving modeling strategies.

Problem

Research questions and friction points this paper is trying to address.

heterogeneity

LLM-based social simulation

audience segmentation

subgroup variation

social attitudes

Innovation

Methods, ideas, or system contributions that make the work stand out.

audience segmentation

LLM-based social simulation

heterogeneity restoration

fidelity evaluation

persona diversity

🔎 Similar Papers

Decoding Echo Chambers: LLM-Powered Simulations Revealing Polarization in Social Networks

2024-09-28International Conference on Computational LinguisticsCitations: 5

GenSim: A General Social Simulation Platform with Large Language Model based Agents

2024-10-06Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)Citations: 13

💼 Related Jobs

Natural Language Processing Researcher

Kitware

Arlington, Virginia

Natural Language Processing Researcher

Kitware

Clifton Park, New York / Carrboro, North Carolina / Minneapolis, MN

Natural Language Processing Researcher

Kitware

Remote, USA: AL, AZ, CO, DC, FL, GA, IL, IN, MA, MD, ME, MN, NC, NM, NY, OH, OR, PA, TN, TX, UT, VA, WI

Research Engineer, Language - Personalization, Meta Superintelligence Labs