Can LLMs Play Ô Ăn Quan Game? A Study of Multi-Step Planning and Decision Making

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capacity of large language models (LLMs) for multi-step strategic decision-making, using Ô Ăn Quan—a traditional Vietnamese board game with complex, dynamically evolving rules and state space—as a novel benchmark to assess planning depth, forward reasoning, and state consistency. Method: We develop diverse prompt-driven agents based on Llama-3.2-3B, 3.1-8B, and 3.3-70B-Instruct, integrating behavioral modeling and strategy personalization to systematically evaluate gameplay performance. Contribution/Results: Model scale strongly correlates with strategic stability and multi-step planning quality; the 70B model exhibits limited yet empirically verifiable deep reasoning capabilities. Critically, this work introduces Ô Ăn Quan into the LLM evaluation ecosystem for the first time, establishing a culturally grounded, non-Western benchmark and methodological framework for assessing AI decision-making in diverse sociocultural contexts.

Technology Category

Application Category

📝 Abstract
In this paper, we explore the ability of large language models (LLMs) to plan and make decisions through the lens of the traditional Vietnamese board game, Ô Ăn Quan. This game, which involves a series of strategic token movements and captures, offers a unique environment for evaluating the decision-making and strategic capabilities of LLMs. Specifically, we develop various agent personas, ranging from aggressive to defensive, and employ the Ô Ăn Quan game as a testbed for assessing LLM performance across different strategies. Through experimentation with models like Llama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, and Llama-3.3-70B-Instruct, we aim to understand how these models execute strategic decision-making, plan moves, and manage dynamic game states. The results will offer insights into the strengths and weaknesses of LLMs in terms of reasoning and strategy, contributing to a deeper understanding of their general capabilities.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' strategic decision-making in board games
Evaluating multi-step planning in dynamic game environments
Analyzing LLM performance across different agent personas
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using traditional game for LLM strategic evaluation
Developing diverse agent personas for testing
Testing multiple LLM models on dynamic planning