Tracing the Thought of a Grandmaster-level Chess-Playing Transformer

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This study investigates the tactical reasoning mechanisms within high-performance chess-playing Transformer models, specifically Leela Chess Zero (LC0). By introducing a sparse decomposition framework applied simultaneously to both MLP and attention modules—combined with sparse autoencoders, causal interventions, and quantitative evaluation metrics—the work systematically dissects critical computational pathways. The authors propose three novel metrics to validate the model’s parallel reasoning behavior, successfully extracting interpretable and verifiable tactical reasoning circuits. Furthermore, the analysis uncovers inductive biases embedded in the policy head architecture. Findings demonstrate that LC0 exhibits superhuman parallel reasoning capabilities distinct from human cognition, offering crucial insights into the decision-making processes of complex AI systems.

Technology Category

Application Category

📝 Abstract
While modern transformer neural networks achieve grandmaster-level performance in chess and other reasoning tasks, their internal computation process remains largely opaque. Focusing on Leela Chess Zero (LC0), we introduce a sparse decomposition framework to interpret its internal computation by decomposing its MLP and attention modules with sparse replacement layers, which capture the primary computation process of LC0. We conduct a detailed case study showing that these pathways expose rich, interpretable tactical considerations that are empirically verifiable. We further introduce three quantitative metrics and show that LC0 exhibits parallel reasoning behavior consistent with the inductive bias of its policy head architecture. To the best of our knowledge, this is the first work to decompose the internal computation of a transformer on both MLP and attention modules for interpretability. Combining sparse replacement layers and causal interventions in LC0 provides a comprehensive understanding of advanced tactical reasoning, offering critical insights into the underlying mechanisms of superhuman systems. Our code is available at https://github.com/JacklE0niden/Leela-SAEs.
Problem

Research questions and friction points this paper is trying to address.

interpretability
transformer
chess
reasoning
neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse decomposition
transformer interpretability
Leela Chess Zero
parallel reasoning
causal intervention
🔎 Similar Papers
No similar papers found.
Rui Lin
Rui Lin
National Institute of Biological Sciences, Beijing, China
NeuroscienceTool development
Z
Zhenyu Jin
School of Mathematics and Statistics, Xi’an Jiaotong University
G
Guancheng Zhou
Shanghai Innovation Institute, Shanghai, China, School of Mathematics and Statistics, Xi’an Jiaotong University
X
Xuyang Ge
Shanghai Innovation Institute, Shanghai, China, School of Computer Science, Fudan University, Shanghai, China
W
Wentao Shu
Shanghai Innovation Institute, Shanghai, China, School of Computer Science, Fudan University, Shanghai, China
J
Jiaxing Wu
Shanghai Innovation Institute, Shanghai, China, School of Computer Science, Fudan University, Shanghai, China
J
Junxuan Wang
Shanghai Innovation Institute, Shanghai, China, School of Computer Science, Fudan University, Shanghai, China
Zhengfu He
Zhengfu He
Shanghai Innovation Institute
Mechanistic InterpretabilityLarge Language Models
J
Junping Zhang
School of Computer Science, Fudan University, Shanghai, China
X
Xipeng Qiu
Shanghai Innovation Institute, Shanghai, China, School of Computer Science, Fudan University, Shanghai, China