A Mega-Study of Digital Twins Reveals Strengths, Weaknesses and Opportunities for Further Improvement

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study rigorously evaluates whether large language model (LLM)-driven digital twins can accurately represent real individuals’ and populations’ behavioral responses. Method: Across 19 preregistered studies, we benchmarked LLM-based digital twins against nationally representative U.S. survey data (N = 164 behavioral metrics), assessing consistency in individual-level predictions, population-level means, and response variability. Results: Digital twins exhibit only modest fidelity in capturing inter-individual rank-order differences (mean r = 0.2), show limited accuracy in predicting individual responses or aggregate means, and systematically underestimate response variability. Performance is significantly moderated by education, income, and ideological moderation, and varies substantially across domains. Critically, this work provides the first large-scale, empirically grounded behavioral validity benchmark for digital twins—integrating longitudinal individual histories, multi-domain surveys, and strict preregistration—thereby delineating current practical boundaries and establishing a foundational validation framework and actionable pathways for developing trustworthy digital twins.

Technology Category

Application Category

📝 Abstract
Do "digital twins" capture individual responses in surveys and experiments? We run 19 pre-registered studies on a national U.S. panel and their LLM-powered digital twins (constructed based on previously-collected extensive individual-level data) and compare twin and human answers across 164 outcomes. The correlation between twin and human answers is modest (approximately 0.2 on average) and twin responses are less variable than human responses. While constructing digital twins based on rich individual-level data improves our ability to capture heterogeneity across participants and predict relative differences between them, it does not substantially improve our ability to predict the exact answers given by specific participants or enhance predictions of population means. Twin performance varies by domain and is higher among more educated, higher-income, and ideologically moderate participants. These results suggest current digital twins can capture some degree of relative differences but are unreliable for individual-level predictions and sample mean and variance estimation, underscoring the need for careful validation before use. Our data and code are publicly available for researchers and practitioners interested in optimizing digital twin pipelines.
Problem

Research questions and friction points this paper is trying to address.

Evaluating how well digital twins capture individual survey responses
Assessing correlation and variability between twin and human answers
Identifying limitations in predicting exact individual responses and population statistics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed LLM-powered digital twins from individual-level data
Compared twin and human responses across 164 survey outcomes
Evaluated twin performance across demographic domains and variability
🔎 Similar Papers
No similar papers found.
T
Tiany Peng
Columbia Business School
G
George Gui
Columbia Business School
D
Daniel J. Merlau
Columbia Business School
G
Grace Jiarui Fan
Columbia Business School
M
Malek Ben Sliman
Columbia Business School
M
Melanie Brucks
Columbia Business School
E
Eric J. Johnson
Columbia Business School
V
Vicki Morwitz
Columbia Business School
A
Abdullah Althenayyan
Columbia Business School
S
Silvia Bellezza
Columbia Business School
D
Dante Donati
Columbia Business School
H
Hortense Fong
Columbia Business School
E
Elizabeth Friedman
Columbia Business School
A
Ariana Guevara
Barnard College
M
Mohamed Hussein
Columbia Business School
Kinshuk Jerath
Kinshuk Jerath
Arthur F Burns Professor of Free and Competitive Enterprise, Professor of Business, Marketing
Marketing -- RetailingOnline AdvertisingData-Based Customer Management
Bruce Kogut
Bruce Kogut
Columbia University, Business School and Sociology
Comparative GovernanceComputational Social ScienceOrganizations
K
Kristen Lane
Columbia Business School
Hannah Li
Hannah Li
Columbia University
online platformsmarketsexperiment design
P
Patryk Perkowski
Yeshiva University
Oded Netzer
Oded Netzer
Arthur J. Samberg Professor of Business, Columbia University
MarketingQuantitative MarketingHidden Markov ModelsText MiningUnstructured Data
Olivier Toubia
Olivier Toubia
Glaubinger Professor of Business, Columbia Business School
Marketing