Predicting Field Experiments with Large Language Models

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) can reliably predict outcomes of field experiments in economics and the social sciences. We develop the first systematic evaluation framework, conducting zero-shot and few-shot predictions across 319 canonical field experiments. Our method introduces structured experimental design prompting and domain-knowledge injection to enhance model reasoning about causal interventions. Results show that LLMs achieve 78% prediction accuracy—significantly outperforming conventional baselines—and provide the first empirical evidence of LLMs’ validity in forecasting real-world social behavior interventions. We further identify systematic performance disparities along gender, ethnicity, and social norm dimensions. This work extends the application frontier of LLMs to causal inference in the social sciences and proposes an interpretable, socially aware sensitivity analysis paradigm—establishing both a methodological foundation and practical pathway for AI-augmented empirical social science.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated unprecedented emergent capabilities, including content generation, translation, and the simulation of human behavior. Field experiments, despite their high cost, are widely employed in economics and the social sciences to study real-world human behavior through carefully designed manipulations and treatments. However, whether and how these models can be utilized to predict outcomes of field experiments remains unclear. In this paper, we propose and evaluate an automated LLM-based framework that produces predictions of field experiment outcomes. Applying this framework to 319 experiments drawn from renowned economics literature yields a notable prediction accuracy of 78%. Interestingly, we find that performance is highly skewed. We attribute this skewness to several factors, including gender differences, ethnicity, and social norms.
Problem

Research questions and friction points this paper is trying to address.

Predicting field experiment outcomes using LLMs
Evaluating LLM accuracy in simulating human behavior
Identifying factors affecting prediction performance skewness
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based framework predicts field experiments
Automated system achieves 78% accuracy
Considers gender, ethnicity, social norms
Y
Yaoyu Chen
The Department of Information and Decision Sciences at the College of Business Administration, University of Illinois at Chicago
Yuheng Hu
Yuheng Hu
Ohio State University
Social ComputingHuman-AI interactionInformation SystemsDigital Platforms
Yingda Lu
Yingda Lu
Associate professor, University of Illinois Chicago
Information Systems