GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

📅 2024-10-07

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

200K/year

🤖 AI Summary

Existing evaluations of large language models (LLMs) in economic decision-making lack standardized, reproducible frameworks, hindering systematic assessment of rationality, efficiency, and fairness—particularly in sequential, language-mediated economic interactions. Method: We introduce the first open-source benchmark for two-player sequential language-based economic games, comprising three parameterized families of linguistic economic games. Our methodology integrates LLM-driven interactive simulation, multidimensional performance attribution analysis, and statistical causal inference to quantify self-interest, Pareto efficiency, and fairness. Contribution/Results: Empirical analysis reveals significant coupling between market structure and model architecture, with LLM behavior exhibiting strong sensitivity to economic environment parameters. This work establishes the first reproducible behavioral analysis paradigm for LLMs in economic contexts, providing empirically grounded design principles for AI-augmented online markets, recommendation systems, and other real-world economic applications.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) show significant potential in economic and strategic interactions, where communication via natural language is often prevalent. This raises key questions: Do LLMs behave rationally? How do they perform compared to humans? Do they tend to reach an efficient and fair outcome? What is the role of natural language in strategic interaction? How do characteristics of the economic environment influence these dynamics? These questions become crucial concerning the economic and societal implications of integrating LLM-based agents into real-world data-driven systems, such as online retail platforms and recommender systems. To answer these questions, we introduce a benchmark for standardizing research on two-player, sequential, language-based games. Inspired by the economic literature, we define three base families of games with consistent parameterization, degrees of freedom and economic measures to evaluate agents' performance (self-gain), as well as the game outcome (efficiency and fairness). We develop an open-source framework for interaction simulation and analysis, and utilize it to collect a dataset of LLM vs. LLM interactions across numerous game configurations and an additional dataset of human vs. LLM interactions. Through extensive experimentation, we demonstrate how our framework and dataset can be used to: (i) compare the behavior of LLM-based agents in various economic contexts; (ii) evaluate agents in both individual and collective performance measures; and (iii) quantify the effect of the economic characteristics of the environments on the behavior of agents. Our results suggest that the market parameters, as well as the choice of the LLMs, tend to have complex and interdependent effects on the economic outcome, which calls for careful design and analysis of the language-based economic ecosystem.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM rationality and performance in economic interactions

Evaluating efficiency and fairness in LLM-based strategic outcomes

Analyzing natural language's role in economic environment dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for two-player language-based games

Open-source interaction simulation framework

Datasets for LLM and human interactions

🔎 Similar Papers

No similar papers found.