Can LLMs Play Ô Ăn Quan Game? A Study of Multi-Step Planning and Decision Making

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study investigates the capacity of large language models (LLMs) for multi-step strategic decision-making, using Ô Ăn Quan—a traditional Vietnamese board game with complex, dynamically evolving rules and state space—as a novel benchmark to assess planning depth, forward reasoning, and state consistency. Method: We develop diverse prompt-driven agents based on Llama-3.2-3B, 3.1-8B, and 3.3-70B-Instruct, integrating behavioral modeling and strategy personalization to systematically evaluate gameplay performance. Contribution/Results: Model scale strongly correlates with strategic stability and multi-step planning quality; the 70B model exhibits limited yet empirically verifiable deep reasoning capabilities. Critically, this work introduces Ô Ăn Quan into the LLM evaluation ecosystem for the first time, establishing a culturally grounded, non-Western benchmark and methodological framework for assessing AI decision-making in diverse sociocultural contexts.

Technology Category

Application Category

📝 Abstract

In this paper, we explore the ability of large language models (LLMs) to plan and make decisions through the lens of the traditional Vietnamese board game, Ô Ăn Quan. This game, which involves a series of strategic token movements and captures, offers a unique environment for evaluating the decision-making and strategic capabilities of LLMs. Specifically, we develop various agent personas, ranging from aggressive to defensive, and employ the Ô Ăn Quan game as a testbed for assessing LLM performance across different strategies. Through experimentation with models like Llama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, and Llama-3.3-70B-Instruct, we aim to understand how these models execute strategic decision-making, plan moves, and manage dynamic game states. The results will offer insights into the strengths and weaknesses of LLMs in terms of reasoning and strategy, contributing to a deeper understanding of their general capabilities.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' strategic decision-making in board games

Evaluating multi-step planning in dynamic game environments

Analyzing LLM performance across different agent personas

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using traditional game for LLM strategic evaluation

Developing diverse agent personas for testing

Testing multiple LLM models on dynamic planning

🔎 Similar Papers

Improving Planning with Large Language Models: A Modular Agentic Architecture