Mini Amusement Parks (MAPs): A Testbed for Modelling Business Decisions

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Real-world business decision-making requires concurrent handling of open-ended goal optimization, active modeling under sparse feedback, long-horizon planning in stochastic environments, and spatial reasoning—yet existing human-AI benchmarks evaluate these capabilities in isolation, failing to assess integrated decision-making competence. To address this gap, we introduce Mini Amusement Parks (MAPs), the first unified simulation benchmark integrating all four dimensions. MAPs enables systematic evaluation of agents’ world modeling under open goals and stochasticity with sparse rewards, long-horizon optimization, and spatial reasoning, while providing LLM-based agent implementations and human performance baselines. Experimental results show that state-of-the-art LLM agents achieve only 15.4% and 10.2% of human decision-making efficiency on easy and medium difficulty levels, respectively—revealing fundamental deficiencies in long-horizon planning and spatial reasoning.

Technology Category

Application Category

📝 Abstract

Despite rapid progress in artificial intelligence, current systems struggle with the interconnected challenges that define real-world decision making. Practical domains, such as business management, require optimizing an open-ended and multi-faceted objective, actively learning environment dynamics from sparse experience, planning over long horizons in stochastic settings, and reasoning over spatial information. Yet existing human--AI benchmarks isolate subsets of these capabilities, limiting our ability to assess holistic decision-making competence. We introduce Mini Amusement Parks (MAPs), an amusement-park simulator designed to evaluate an agent's ability to model its environment, anticipate long-term consequences under uncertainty, and strategically operate a complex business. We provide human baselines and a comprehensive evaluation of state-of-the-art LLM agents, finding that humans outperform these systems by 6.5x on easy mode and 9.8x on medium mode. Our analysis reveals persistent weaknesses in long-horizon optimization, sample-efficient learning, spatial reasoning, and world modelling. By unifying these challenges within a single environment, MAPs offers a new foundation for benchmarking agents capable of adaptable decision making. Code: https://github.com/Skyfall-Research/MAPs

Problem

Research questions and friction points this paper is trying to address.

Developing a testbed to evaluate holistic AI decision-making in business environments

Addressing weaknesses in long-horizon optimization and spatial reasoning for agents

Creating unified benchmarks for adaptable decision-making under uncertainty

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates amusement parks for business decision testing

Evaluates agent modeling of environment and consequences

Benchmarks long-horizon optimization and spatial reasoning

🔎 Similar Papers

No similar papers found.