🤖 AI Summary
This work investigates whether large language models (LLMs) possess intrinsic, structured mechanisms underlying abstract reasoning—and whether such mechanisms exhibit robustness. Using Llama3-70B as a testbed, we empirically identify, for the first time, a spontaneous three-tier symbolic architecture within its feedforward layers: early layers map inputs into relation-driven abstract variables; middle layers inductively process variable sequences; and late layers retrieve and generate outputs. This architecture is implemented cooperatively by symbolic abstraction heads, induction heads, and retrieval heads. We systematically validate it via attention interpretability analysis, causal mediation testing, inter-layer functional disentanglement, and abstract variable tracking. The architecture demonstrates strong generalization and robustness against perturbations across diverse abstract reasoning tasks. Our findings provide the first systematic evidence for endogenous symbolic processing in neural networks, bridging a foundational cognitive gap between neural and symbolic AI.
📝 Abstract
Many recent studies have found evidence for emergent reasoning capabilities in large language models, but debate persists concerning the robustness of these capabilities, and the extent to which they depend on structured reasoning mechanisms. To shed light on these issues, we perform a comprehensive study of the internal mechanisms that support abstract rule induction in an open-source language model (Llama3-70B). We identify an emergent symbolic architecture that implements abstract reasoning via a series of three computations. In early layers, symbol abstraction heads convert input tokens to abstract variables based on the relations between those tokens. In intermediate layers, symbolic induction heads perform sequence induction over these abstract variables. Finally, in later layers, retrieval heads predict the next token by retrieving the value associated with the predicted abstract variable. These results point toward a resolution of the longstanding debate between symbolic and neural network approaches, suggesting that emergent reasoning in neural networks depends on the emergence of symbolic mechanisms.