Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates whether large language models (LLMs) can acquire strategic reasoning capabilities in chess via reinforcement learning (RL). It identifies an inherent limitation: pretrained LLMs exhibit fundamental deficits in strategic understanding, severely constraining RL performance gains. To address this, the authors propose a knowledge distillation–based dense reward mechanism leveraging an action-value network—using high-accuracy chess engines to generate fine-grained, action-level feedback, integrated with supervised fine-tuning and RL policy optimization. Experiments demonstrate that dense rewards substantially outperform sparse rewards; however, all LLM variants remain markedly inferior to human experts, exposing a fundamental bottleneck of the pretraining paradigm in deep strategic modeling. The core contributions are (1) the first systematic empirical characterization of the RL plasticity boundary for strategic reasoning in LLMs, and (2) a transferable, distillation-augmented RL framework for strategic skill acquisition.

Technology Category

Application Category

📝 Abstract

While reinforcement learning (RL) for large language models (LLMs) has shown promise in mathematical reasoning, strategic reasoning for LLMs using RL remains largely unexplored. We investigate whether LLMs can develop strategic reasoning capabilities through RL in chess. To this end, we leverage a chess-pretrained action-value network to provide dense reward on the LLM's output move quality, which can be seen as a form of knowledge distillation. Our experiments show that our distillation-based dense rewards often outperform sparse binary rewards. However, surprisingly, all models plateau far below expert levels. We provide SFT and RL ablations on chess reasoning training and find evidence that this limitation stems from a deficit in the pretrained models' internal understanding of chess--a deficit which RL alone may not be able to fully overcome.

Problem

Research questions and friction points this paper is trying to address.

Exploring strategic reasoning in LLMs via reinforcement learning in chess

Comparing dense vs sparse rewards for improving LLM move quality

Investigating pretrained model limitations in chess understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

RL-based strategic reasoning for LLMs in chess

Chess-pretrained action-value network for dense rewards

Knowledge distillation outperforms sparse binary rewards

🔎 Similar Papers

No similar papers found.

Authors to Follow