🤖 AI Summary
This study investigates how large language models acquire abstract logical reasoning capabilities from few-shot examples and identifies the underlying internal mechanisms. To this end, the authors propose a symbol-augmented chain-of-thought prompting framework that aligns discrete reasoning steps with token-level logits, enabling causal mediation analysis to pinpoint attention heads responsible for specific reasoning functions. The work reveals, for the first time, a hierarchical reasoning architecture within the model: approximately 3% of specialized attention heads handle low-level retrieval of facts and rules, while higher-layer heads orchestrate global strategies such as information integration and graph traversal. Notably, critical reasoning positions often exhibit low-confidence logits, suggesting that the model performs structured reasoning under uncertainty.
📝 Abstract
Recent studies have shown that Large Language Models (LLMs) can achieve strong reasoning performance by incorporating functional symbolic representations that abstractly describe graph traversal algorithms and step-by-step reasoning in few-shot learning settings. However, it remains unclear how LLMs genuinely understand the abstract meaning of each reasoning step and the overall algorithm from only a limited number of demonstrations. This work aims to localize the attention heads responsible for individual reasoning steps and characterize the types of information transferred among them. We first align constituent reasoning steps with their corresponding token logits under a symbolic-aided Chain-of-Thought (CoT) prompting framework. Our analysis shows that token positions that steer the reasoning process are associated with low confidence scores caused by constraints on satisfying reasoning behavior patterns in demonstrations. We then adopt causal mediation analysis techniques to identify the attention heads responsible for these patterns. In addition, our findings indicate that LLMs retrieve factual and rule-based information for individual sub-reasoning tasks through specialized attention heads (approximately 3% total heads), whereas higher layers predominantly facilitate information integration and the emergence of global reasoning strategies (e.g., graph traversal algorithms) that coordinate multiple intermediate reasoning steps to solve the overall task.