Mixture-of-Personas Language Models for Population Simulation

πŸ“… 2025-04-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Pretrained large language models (LLMs) struggle to capture the behavioral diversity of real human populations, limiting their credibility in social science research and human behavior modeling. To address this, we propose the probabilistic Mixture-of-Personas (MoP) prompting frameworkβ€”a novel, tuning-free method that models subgroup-level personality traits and behavioral exemplars as context-aware, learnable-weighted mixture components, ensuring cross-model generalizability. MoP integrates personality-driven agent design, stratified behavioral exemplar sampling, and weighted response aggregation to achieve fine-grained alignment with target population distributions. Experiments on synthetic data generation demonstrate that MoP significantly improves demographic alignment and behavioral diversity metrics, outperforming state-of-the-art prompting and fine-tuning baselines across multiple dimensions. By enabling scalable, low-barrier population-aligned modeling without architectural modification or parameter updates, MoP establishes a new paradigm for social computing and human-AI collaborative behavioral modeling.

Technology Category

Application Category

πŸ“ Abstract
Advances in Large Language Models (LLMs) paved the way for their emerging applications in various domains, such as human behavior simulations, where LLMs could augment human-generated data in social science research and machine learning model training. However, pretrained LLMs often fail to capture the behavioral diversity of target populations due to the inherent variability across individuals and groups. To address this, we propose extit{Mixture of Personas} (MoP), a extit{probabilistic} prompting method that aligns the LLM responses with the target population. MoP is a contextual mixture model, where each component is an LM agent characterized by a persona and an exemplar representing subpopulation behaviors. The persona and exemplar are randomly chosen according to the learned mixing weights to elicit diverse LLM responses during simulation. MoP is flexible, requires no model finetuning, and is transferable across base models. Experiments for synthetic data generation show that MoP outperforms competing methods in alignment and diversity metrics.
Problem

Research questions and friction points this paper is trying to address.

LLMs lack behavioral diversity in population simulations
Aligning LLM responses with target population variability
Generating diverse synthetic data without model finetuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic prompting for population alignment
Contextual mixture model with persona agents
No model finetuning, transferable across models
N
Ngoc Bui
Yale University
H
Hieu Trung Nguyen
The Chinese University of Hong Kong
Shantanu Kumar
Shantanu Kumar
Yale University
J
Julian Theodore
Yale University
Weikang Qiu
Weikang Qiu
PhD student, Yale University
machine learningNeuroscience
Viet Anh Nguyen
Viet Anh Nguyen
The Chinese University of Hong Kong
Machine LearningOptimizationDecision AnalyticsOperations Research
R
Rex Ying
Yale University