🤖 AI Summary
This study systematically evaluates whether large language model–driven digital personas can reliably substitute for human respondents in survey research. Leveraging the LISS longitudinal panel data, the authors construct four persona architectures—including a retrieval-augmented variant—by integrating respondents’ demographic backgrounds and historical answers, and validate them across three prominent large language models. For the first time, the fidelity of digital personas to human responses is assessed across multiple dimensions: question-level accuracy, individual prediction, distributional alignment, fairness, and clustering behavior. Results indicate that digital personas significantly improve distributional alignment for stable-attribute questions but exhibit limited capacity for individual-level prediction and fail to replicate complex multivariate response structures. The retrieval-augmented architecture performs best overall, with effectiveness more strongly dependent on characteristics of human responses than on model choice, leading to practical deployment guidelines.
📝 Abstract
Digital personas powered by Large Language Models (LLMs) are increasingly proposed as substitutes for human survey respondents, yet it remains unclear when they can reliably approximate human survey findings. We answer this question using the LISS panel, constructing personas from respondents' background variables and pre-2023 survey histories, then testing them against the same respondents' held-out post-cutoff answers. Across four persona architectures, three LLMs, and two prediction tasks, we assess performance at the question, respondent, distributional, equity, and clustering levels. Digital personas improve alignment with human response distributions, especially in domains tied to stable attributes and values, but remain limited for individual prediction and fail to recover multivariate respondent structure. Retrieval-augmented architectures provide the clearest gains, but performance depends more on human response structure than on model choice: personas perform best for low-variability questions and common respondent patterns, and worst for subjective, heterogeneous, or rare responses. Our results provide practical guidance on when digital personas could be appropriate for survey research and when human validation remains necessary.