🤖 AI Summary
This study investigates the tactical reasoning mechanisms within high-performance chess-playing Transformer models, specifically Leela Chess Zero (LC0). By introducing a sparse decomposition framework applied simultaneously to both MLP and attention modules—combined with sparse autoencoders, causal interventions, and quantitative evaluation metrics—the work systematically dissects critical computational pathways. The authors propose three novel metrics to validate the model’s parallel reasoning behavior, successfully extracting interpretable and verifiable tactical reasoning circuits. Furthermore, the analysis uncovers inductive biases embedded in the policy head architecture. Findings demonstrate that LC0 exhibits superhuman parallel reasoning capabilities distinct from human cognition, offering crucial insights into the decision-making processes of complex AI systems.
📝 Abstract
While modern transformer neural networks achieve grandmaster-level performance in chess and other reasoning tasks, their internal computation process remains largely opaque. Focusing on Leela Chess Zero (LC0), we introduce a sparse decomposition framework to interpret its internal computation by decomposing its MLP and attention modules with sparse replacement layers, which capture the primary computation process of LC0. We conduct a detailed case study showing that these pathways expose rich, interpretable tactical considerations that are empirically verifiable. We further introduce three quantitative metrics and show that LC0 exhibits parallel reasoning behavior consistent with the inductive bias of its policy head architecture. To the best of our knowledge, this is the first work to decompose the internal computation of a transformer on both MLP and attention modules for interpretability. Combining sparse replacement layers and causal interventions in LC0 provides a comprehensive understanding of advanced tactical reasoning, offering critical insights into the underlying mechanisms of superhuman systems. Our code is available at https://github.com/JacklE0niden/Leela-SAEs.