🤖 AI Summary
While large language models exhibit logical reasoning capabilities, their internal mechanisms remain poorly understood—particularly how they implement symbolic deduction. Method: This work investigates small-scale transformer models on deductive reasoning tasks, introducing a mechanistic interpretability framework to systematically analyze model representations and computational circuits. Contribution/Results: We identify induction heads as the core architectural components responsible for rule completion and rule-chain construction—revealing, for the first time at the neuron level, how models explicitly learn and execute logical rules rather than relying on statistical correlations. Empirical evaluation confirms that small models internalize formal rules and faithfully reproduce human-like step-by-step inference. Our analysis provides the first fine-grained, empirically verifiable neural account of symbolic reasoning in language models, establishing a foundational mechanistic basis for understanding rule-based deduction in neural architectures.
📝 Abstract
Recent large language models have demonstrated relevant capabilities in solving problems that require logical reasoning; however, the corresponding internal mechanisms remain largely unexplored. In this paper, we show that a small language model can solve a deductive reasoning task by learning the underlying rules (rather than operating as a statistical learner). A low-level explanation of its internal representations and computational circuits is then provided. Our findings reveal that induction heads play a central role in the implementation of the rule completion and rule chaining steps involved in the logical inference required by the task.