🤖 AI Summary
This study investigates the capacity of large language models (LLMs) for multi-step strategic decision-making, using Ô Ăn Quan—a traditional Vietnamese board game with complex, dynamically evolving rules and state space—as a novel benchmark to assess planning depth, forward reasoning, and state consistency. Method: We develop diverse prompt-driven agents based on Llama-3.2-3B, 3.1-8B, and 3.3-70B-Instruct, integrating behavioral modeling and strategy personalization to systematically evaluate gameplay performance. Contribution/Results: Model scale strongly correlates with strategic stability and multi-step planning quality; the 70B model exhibits limited yet empirically verifiable deep reasoning capabilities. Critically, this work introduces Ô Ăn Quan into the LLM evaluation ecosystem for the first time, establishing a culturally grounded, non-Western benchmark and methodological framework for assessing AI decision-making in diverse sociocultural contexts.
📝 Abstract
In this paper, we explore the ability of large language models (LLMs) to plan and make decisions through the lens of the traditional Vietnamese board game, Ô Ăn Quan. This game, which involves a series of strategic token movements and captures, offers a unique environment for evaluating the decision-making and strategic capabilities of LLMs. Specifically, we develop various agent personas, ranging from aggressive to defensive, and employ the Ô Ăn Quan game as a testbed for assessing LLM performance across different strategies. Through experimentation with models like Llama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, and Llama-3.3-70B-Instruct, we aim to understand how these models execute strategic decision-making, plan moves, and manage dynamic game states. The results will offer insights into the strengths and weaknesses of LLMs in terms of reasoning and strategy, contributing to a deeper understanding of their general capabilities.