🤖 AI Summary
Current LLM behavioral evaluation lacks standardized, reproducible experimental design guidelines. This paper systematically integrates principles from experimental economics into LLM research for the first time, proposing eight actionable experimental design strategies and establishing a methodological framework—Experimental Economics for LLMs (Eco-LLM). Two rigorously controlled, strategy-validated experiments demonstrate that Eco-LLM significantly enhances experimental rigor, result reproducibility, and cross-model comparability. The framework addresses a critical gap in LLM evaluation methodology and extends the theoretical scope and practical applicability of experimental economics in the AI era. By grounding LLM assessment in well-established behavioral and incentive-aligned experimental paradigms, Eco-LLM provides a generalizable methodological foundation for human-AI interaction studies, mechanism design in AI systems, and trustworthy AI evaluation in the digital age.
📝 Abstract
Advancements in large language models (LLMs) have sparked a growing interest in measuring and understanding their behavior through experimental economics. However, there is still a lack of established guidelines for designing economic experiments for LLMs. By combining principles from experimental economics with insights from LLM research in artificial intelligence, we outline and discuss eight practical tactics for conducting experiments with LLMs. We further perform two sets of experiments to demonstrate the significance of these tactics. Our study enhances the design, replicability, and generalizability of LLM experiments, and broadens the scope of experimental economics in the digital age.