Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM alignment research employing synthetic personas suffers from inadequate representativeness and low ecological validity: target tasks and user populations are frequently undefined, violating foundational principles of personalized modeling. Method: We systematically reviewed 63 relevant studies using a dual framework of content analysis and methodological assessment, applying multidimensional coding across sociodemographic dimensions and task-relevance criteria. Contribution/Results: Our analysis reveals that only 35% of studies address persona representativeness. To address this gap, we introduce the first transparency checklist specifically designed for LLM persona experiments—emphasizing empirically grounded sampling and context anchoring. This tool shifts persona construction from ad hoc, intuition-driven practices toward methodologically rigorous, reproducible protocols. It provides actionable guidance to enhance the scientific validity and fairness of personalized LLM evaluation, supporting more robust and socially accountable alignment research.

Technology Category

Application Category

📝 Abstract
Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.
Problem

Research questions and friction points this paper is trying to address.

Evaluating representativeness of synthetic personas in LLM alignment studies
Addressing underspecified tasks and populations in persona-based experiments
Improving transparency and ecological validity in persona evaluation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing a persona transparency checklist for LLMs
Emphasizing representative sampling in persona experiments
Grounding synthetic personae in empirical data
🔎 Similar Papers
No similar papers found.
J
Jan Batzner
Weizenbaum Institute
V
Volker Stocker
Weizenbaum Institute, Technical University Berlin
B
Bingjun Tang
Columbia University
A
Anusha Natarajan
Columbia University
Q
Qinhao Chen
Columbia University
S
Stefan Schmid
Weizenbaum Institute, Technical University Berlin
Gjergji Kasneci
Gjergji Kasneci
Professor at the Technical University of Munich
Responsible Data ScienceResponsible AIExplainable Machine LearningAlgorithmic Accountability