🤖 AI Summary
Existing static financial benchmarks inadequately assess the true capabilities of large language models (LLMs) in dynamic wealth management. This work addresses this gap by formally modeling three classic financial board games—Cashflow, Acquire, and Monopoly—as multi-agent environments to construct a dynamic evaluation framework. The study systematically evaluates nine state-of-the-art LLMs on long-term decision-making tasks involving cash management, merger-and-acquisition investments, and asset bidding. Results reveal that while these models exhibit basic investment reasoning, they consistently overlook liquidity risk and are prone to financial distress under stochastic shocks. This highlights a significant disparity between their strong static reasoning abilities and weaker dynamic decision-making performance, underscoring a critical bottleneck in translating static financial knowledge into sustained, adaptive gains in volatile environments.
📝 Abstract
Recently, large language models (LLMs) have achieved superior performance in static financial reasoning and simple dynamic trading tasks. However, existing static financial benchmarks are insufficient to assess the dynamic wealth management and financial decision-making capabilities of LLMs in real-world environments. To bridge this gap, we present FinBoardBench, an evaluation suite based on three classic financial board games: Cashflow, Acquire, and Monopoly. FinBoardBench assesses a comprehensive set of financial skills, including personal cash flow management with debt balancing, corporate investment and acquisition forecasting, and competitive trade negotiations with asset auctions. Our experiments with 9 advanced LLMs reveal that while exhibiting basic long-term planning and investment logic, they fail to effectively leverage complex interactions for profit, and their strong static reasoning performance does not transform into successful dynamic decision-making. Notably, they tend to prioritize immediate asset acquisition over maintaining sufficient liquidity, making them vulnerable to financial crises triggered by random events. We hope that FinBoardBench can provide a valuable reference for more intelligent LLM-based decision-making systems in the future.