Out-of-distribution Tests Reveal Compositionality in Chess Transformers

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether Decision Transformers genuinely acquire chess rules and exhibit systematic generalization. Method: We train a 270M-parameter autoregressive Decision Transformer on human game data and evaluate it on out-of-distribution (OOD) test sets, including Chess960 variants and nonstandard openings. Contribution/Results: The model maintains high move legality and quality under OOD conditions and adapts robustly to unconventional initial positions. Crucially, we observe—during training—the spontaneous emergence of a hard constraint enforcing “move only own pieces,” indicating internalized rule abstraction. In live Lichess play, the model achieves performance comparable to symbolic AI engines and substantially surpasses random baselines. This work provides the first empirical evidence that large language model–inspired architectures can autonomously abstract structural rule knowledge from raw sequential data in complex combinatorial rule systems—a key finding for neuro-symbolic integration.

Technology Category

Application Category

📝 Abstract
Chess is a canonical example of a task that requires rigorous reasoning and long-term planning. Modern decision Transformers - trained similarly to LLMs - are able to learn competent gameplay, but it is unclear to what extent they truly capture the rules of chess. To investigate this, we train a 270M parameter chess Transformer and test it on out-of-distribution scenarios, designed to reveal failures of systematic generalization. Our analysis shows that Transformers exhibit compositional generalization, as evidenced by strong rule extrapolation: they adhere to fundamental syntactic rules of the game by consistently choosing valid moves even in situations very different from the training data. Moreover, they also generate high-quality moves for OOD puzzles. In a more challenging test, we evaluate the models on variants including Chess960 (Fischer Random Chess) - a variant of chess where starting positions of pieces are randomized. We found that while the model exhibits basic strategy adaptation, they are inferior to symbolic AI algorithms that perform explicit search, but gap is smaller when playing against users on Lichess. Moreover, the training dynamics revealed that the model initially learns to move only its own pieces, suggesting an emergent compositional understanding of the game.
Problem

Research questions and friction points this paper is trying to address.

Testing chess Transformers' generalization on out-of-distribution scenarios
Investigating whether Transformers truly capture chess rules systematically
Evaluating model adaptation to randomized chess variants like Chess960
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer tested via out-of-distribution chess scenarios
Model demonstrates rule extrapolation and compositional generalization
Evaluated on Chess960 with basic strategy adaptation
🔎 Similar Papers
No similar papers found.
A
Anna M'esz'aros
University of Cambridge, Cambridge, United Kingdom
Patrik Reizinger
Patrik Reizinger
Max Planck Institute for Intelligent Systems
Causal Representation LearningIndependent Component AnalysisMachine LearningCausal Inference
F
Ferenc Husz'ar
University of Cambridge, Cambridge, United Kingdom