Large language models replicate and predict human cooperation across experiments in game theory

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) are increasingly deployed for simulating social decision-making, yet their capacity to authentically reproduce human cooperative behavior remains systematically unverified. This study introduces a game-theoretic digital twin framework that evaluates LLMs’ ability to replicate and predict human cooperation patterns at the population level—without relying on role-based prompting. Using open-source models—including Llama, Mistral, and Qwen—we employ systematic prompting strategies and behavioral probing mechanisms. Results show that Llama achieves high-fidelity replication of human cooperation rates and empirically observed irrational biases, whereas Qwen converges more closely toward Nash equilibrium predictions. Notably, the models also generate novel, empirically testable hypotheses about cooperation in previously unexamined game scenarios, thereby extending traditional experimental boundaries. This work presents the first scalable, role-agnostic simulation of social behavior with verifiable predictive validity, establishing both methodological foundations and empirical evidence for the credible application of LLMs in the social sciences.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly used both to make decisions in domains such as health, education and law, and to simulate human behavior. Yet how closely LLMs mirror actual human decision-making remains poorly understood. This gap is critical: misalignment could produce harmful outcomes in practical applications, while failure to replicate human behavior renders LLMs ineffective for social simulations. Here, we address this gap by developing a digital twin of game-theoretic experiments and introducing a systematic prompting and probing framework for machine-behavioral evaluation. Testing three open-source models (Llama, Mistral and Qwen), we find that Llama reproduces human cooperation patterns with high fidelity, capturing human deviations from rational choice theory, while Qwen aligns closely with Nash equilibrium predictions. Notably, we achieved population-level behavioral replication without persona-based prompting, simplifying the simulation process. Extending beyond the original human-tested games, we generate and preregister testable hypotheses for novel game configurations outside the original parameter grid. Our findings demonstrate that appropriately calibrated LLMs can replicate aggregate human behavioral patterns and enable systematic exploration of unexplored experimental spaces, offering a complementary approach to traditional research in the social and behavioral sciences that generates new empirical predictions about human social decision-making.
Problem

Research questions and friction points this paper is trying to address.

Evaluating how closely LLMs replicate human decision-making in game theory
Developing a systematic framework to test LLM behavioral alignment
Exploring LLMs' ability to generate novel hypotheses about human cooperation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Digital twin of game-theoretic experiments for evaluation
Systematic prompting and probing framework for behavior
Generating preregistered hypotheses for novel game configurations
🔎 Similar Papers
No similar papers found.
A
Andrea Cera Palatsi
Department for Computational Social Sciences and Humanities, Barcelona Supercomputing Center
S
Samuel Martin-Gutierrez
Grupo de Sistemas Complejos, Universidad Politécnica de Madrid
Ana S. Cardenal
Ana S. Cardenal
Universitat Oberta de Catalunya, Barcelona Supercomputing Center
Digital MediaNews AudiencesPublic OpinionPolitical BehaviorComputational Methods
Max Pellert
Max Pellert
Barcelona Supercomputing Center
Computational Social ScienceCognitive ScienceComplexity ScienceData ScienceAI