Complete Chess Games Enable LLM Become A Chess Master

📅 2025-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle with end-to-end chess reasoning due to difficulties in directly modeling board states and legal move constraints. Method: We encode full games as text sequences, representing positions in Forsyth–Edwards Notation (FEN) and annotating optimal moves; supervised fine-tuning enables LLMs to generate complete, legally valid, and coherent games—achieving the first end-to-end generation of high-quality full-length chess games by an LLM. Contribution/Results: Training on long-horizon, high-consistency game data yields a +350 Elo gain, demonstrating that data structure critically governs game-theoretic reasoning capability. Integrated with multi-sample decoding, our model achieves 1788 Elo on the Stockfish benchmark (10 samples), substantially outperforming prior text-based baselines. This establishes a novel, purely language-driven pathway toward strong chess AI, bypassing traditional symbolic or reinforcement-learning architectures.

Technology Category

Application Category

📝 Abstract
Large language models (LLM) have shown remarkable abilities in text generation, question answering, language translation, reasoning and many other tasks. It continues to advance rapidly and is becoming increasingly influential in various fields, from technology and business to education and entertainment. Despite LLM's success in multiple areas, its ability to play abstract games, such as chess, is underexplored. Chess-playing requires the language models to output legal and reasonable moves from textual inputs. Here, we propose the Large language model ChessLLM to play full chess games. We transform the game into a textual format with the best move represented in the Forsyth-Edwards Notation. We show that by simply supervised fine-tuning, our model has achieved a professional-level Elo rating of 1788 in matches against the standard Elo-rated Stockfish when permitted to sample 10 times. We further show that data quality is important. Long-round data supervision enjoys a 350 Elo rating improvement over short-round data.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Chess
Text Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

ChessLLM
Elo rating improvement
Text-based chess learning
🔎 Similar Papers
No similar papers found.
Y
Yinqi Zhang
Xintian Han
Xintian Han
ByteDance
Machine Learning
H
Haolong Li
K
Kedi Chen
S
Shaohui Lin