Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Current large language models (LLMs) exhibit limited performance on complex spatial strategic reasoning tasks—such as Chinese chess—particularly in modeling long-term strategic consistency within fully observable games. To address this, we propose Xiangqi-R1: a multi-stage training paradigm specifically designed for Chinese chess, integrating rule-constrained injection, expert-annotated strategy supervision via fine-tuning, and group-relative policy optimization (GRPO) reinforced by multidimensional rewards (engine evaluation + expert labeling). Trained on 5 million board-move pairs, the 7B-parameter model achieves end-to-end move generation and high-fidelity position analysis. Experiments demonstrate that Xiangqi-R1 improves move legality by 18% and critical position analysis accuracy by 22% over general-purpose baseline LLMs. This work is the first to empirically validate that structured domain-specific training enables LLMs to develop stable, interpretable spatial strategic reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Game playing has long served as a fundamental benchmark for evaluating Artificial General Intelligence (AGI). While Large Language Models (LLMs) have demonstrated impressive capabilities in general reasoning, their effectiveness in spatial strategic reasoning, which is critical for complex and fully observable board games, remains insufficiently explored. In this work, we adopt Chinese Chess (Xiangqi) as a challenging and rich testbed due to its intricate rules and spatial complexity. To advance LLMs' strategic competence in such environments, we propose a training framework tailored to Xiangqi, built upon a large-scale dataset of five million board-move pairs enhanced with expert annotations and engine evaluations. Building on this foundation, we introduce Xiangqi-R1, a 7B-parameter model trained in multi-stage manner: (1) fine-tuning for legal move prediction to capture basic spatial rules, (2) incorporating strategic annotations to improve decision-making, and (3) applying reinforcement learning via Group Relative Policy Optimization (GRPO) with multi-dimensional reward signals to enhance reasoning stability. Our Experimental results indicate that, despite their size and power, general-purpose LLMs struggle to achieve satisfactory performance in these tasks. Compared to general-purpose LLMs, Xiangqi-R1 greatly advances with an 18% rise in move legality and a 22% boost in analysis accuracy. Our results point to a promising path for creating general strategic intelligence in spatially complex areas.

Problem

Research questions and friction points this paper is trying to address.

Enhancing spatial strategic reasoning in LLMs for Chinese Chess

Improving decision-making in complex board games via reinforcement learning

Addressing LLMs' limitations in spatial complexity and strategic competence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning for legal move prediction

Incorporating strategic annotations for decisions

Applying GRPO reinforcement learning for stability

🔎 Similar Papers

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search