Toward Mechanistic Explanation of Deductive Reasoning in Language Models

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

While large language models exhibit logical reasoning capabilities, their internal mechanisms remain poorly understood—particularly how they implement symbolic deduction. Method: This work investigates small-scale transformer models on deductive reasoning tasks, introducing a mechanistic interpretability framework to systematically analyze model representations and computational circuits. Contribution/Results: We identify induction heads as the core architectural components responsible for rule completion and rule-chain construction—revealing, for the first time at the neuron level, how models explicitly learn and execute logical rules rather than relying on statistical correlations. Empirical evaluation confirms that small models internalize formal rules and faithfully reproduce human-like step-by-step inference. Our analysis provides the first fine-grained, empirically verifiable neural account of symbolic reasoning in language models, establishing a foundational mechanistic basis for understanding rule-based deduction in neural architectures.

Technology Category

Application Category

📝 Abstract

Recent large language models have demonstrated relevant capabilities in solving problems that require logical reasoning; however, the corresponding internal mechanisms remain largely unexplored. In this paper, we show that a small language model can solve a deductive reasoning task by learning the underlying rules (rather than operating as a statistical learner). A low-level explanation of its internal representations and computational circuits is then provided. Our findings reveal that induction heads play a central role in the implementation of the rule completion and rule chaining steps involved in the logical inference required by the task.

Problem

Research questions and friction points this paper is trying to address.

Explaining internal mechanisms of deductive reasoning in language models

Investigating how small models learn underlying logical rules

Identifying computational circuits implementing rule completion and chaining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Small model learns deductive reasoning rules

Explains internal representations and computational circuits

Identifies induction heads enabling rule chaining steps

🔎 Similar Papers

A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models