Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of large language models’ (LLMs’) economic decision-making and resource management capabilities, which are often overlooked in favor of semantic performance. To bridge this gap, the work proposes the first multi-agent supply chain simulation framework incorporating economic competition mechanisms, wherein LLMs assume the role of retailers operating under budget constraints. Agents participate in procurement auctions, dynamic pricing, and role-aware marketing slogan generation, with full transaction trajectories recorded. Evaluations across 20 open- and closed-source models employ multidimensional metrics encompassing economic profit, operational efficiency, and semantic quality. Results reveal that only a minority of models consistently achieve capital appreciation, while most—despite comparable semantic competence—fail to surpass breakeven, exhibiting a pronounced “winner-takes-all” dynamic that challenges conventional LLM evaluation paradigms.

Technology Category

Application Category

📝 Abstract

The ability of large language models (LLMs) to manage and acquire economic resources remains unclear. In this paper, we introduce \textbf{Market-Bench}, a comprehensive benchmark that evaluates the capabilities of LLMs in economically-relevant tasks through economic and trade competition. Specifically, we construct a configurable multi-agent supply chain economic model where LLMs act as retailer agents responsible for procuring and retailing merchandise. In the \textbf{procurement} stage, LLMs bid for limited inventory in budget-constrained auctions. In the \textbf{retail} stage, LLMs set retail prices, generate marketing slogans, and provide them to buyers through a role-based attention mechanism for purchase. Market-Bench logs complete trajectories of bids, prices, slogans, sales, and balance-sheet states, enabling automatic evaluation with economic, operational, and semantic metrics. Benchmarking on 20 open- and closed-source LLM agents reveals significant performance disparities and winner-take-most phenomenon, \textit{i.e.}, only a small subset of LLM retailers can consistently achieve capital appreciation, while many hover around the break-even point despite similar semantic matching scores. Market-Bench provides a reproducible testbed for studying how LLMs interact in competitive markets.

Problem

Research questions and friction points this paper is trying to address.

large language models

economic resources

trade competition

market benchmarking

multi-agent economy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Market-Bench

multi-agent economic simulation

budget-constrained auctions