Using Large Language Models to Simulate Human Behavioural Experiments: Port of Mars

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional human behavioral experiments suffer from limited sample sizes and insufficient sociodemographic heterogeneity. To address this, this study pioneers the application of large language models (LLMs) to complex institutional economics scenarios—specifically, the collective-risk social dilemma (CRSD)—using the canonical sustainability experiment “Port of Mars” as a benchmark. We propose a behavior simulation framework integrating multi-model prompt engineering, role-based personality injection, and sociodemographic attribute control, enabling scalable, heterogeneous human decision-making simulation. Empirical evaluation demonstrates that LLM-simulated populations closely approximate real human data in cooperation rates, dynamic response patterns, and group-level differentiation; moreover, they systematically replicate cross-cultural and cross-educational behavioral variations. This work establishes a novel LLM-driven paradigm for social behavior simulation, extending beyond prior applications confined to simple games or political surveys.

Technology Category

Application Category

📝 Abstract
Collective risk social dilemmas (CRSD) highlight a trade-off between individual preferences and the need for all to contribute toward achieving a group objective. Problems such as climate change are in this category, and so it is critical to understand their social underpinnings. However, rigorous CRSD methodology often demands large-scale human experiments but it is difficult to guarantee sufficient power and heterogeneity over socio-demographic factors. Generative AI offers a potential complementary approach to address thisproblem. By replacing human participants with large language models (LLM), it allows for a scalable empirical framework. This paper focuses on the validity of this approach and whether it is feasible to represent a large-scale human-like experiment with sufficient diversity using LLM. In particular, where previous literature has focused on political surveys, virtual towns and classical game-theoretic examples, we focus on a complex CRSD used in the institutional economics and sustainability literature known as Port of Mars
Problem

Research questions and friction points this paper is trying to address.

Simulating human behavior in collective risk dilemmas using LLMs
Assessing validity of LLMs for large-scale human-like experiments
Exploring diversity representation in complex social dilemma scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs to simulate human experiments
Scalable framework for CRSD studies
Validating diversity in LLM-based simulations
Oliver Slumbers
Oliver Slumbers
Helsing AI
Multi-Agent SystemsReinforcement Learning
J
Joel Z. Leibo
Google DeepMind
M
Marco A. Janssen
Arizona State University