🤖 AI Summary
This study investigates the internal mechanisms underlying mathematical reasoning in large language models, with a focus on how different layers collaborate during arithmetic tasks. Through early decoding, inter-layer activation analysis, and module ablation, the work systematically traces the step-by-step prediction process across layers. It reveals a clear functional division in high-performing models: attention modules primarily identify the structural patterns of arithmetic problems, while MLP modules predominantly carry out numerical computation. The findings indicate that task recognition precedes result generation, and correct outputs emerge only at the final layer. Crucially, this structured reasoning pattern is observed exclusively in models with strong arithmetic capabilities, suggesting that their performance stems not from rote memorization but from genuine compositional reasoning.
📝 Abstract
Large language models (LLMs) have demonstrated impressive capabilities, yet their internal mechanisms for handling reasoning-intensive tasks remain underexplored. To advance the understanding of model-internal processing mechanisms, we present an investigation of how LLMs perform arithmetic operations by examining internal mechanisms during task execution. Using early decoding, we trace how next-token predictions are constructed across layers. Our experiments reveal that while the models recognize arithmetic tasks early, correct result generation occurs only in the final layers. Notably, models proficient in arithmetic exhibit a clear division of labor between attention and MLP modules, where attention propagates input information and MLP modules aggregate it. This division is absent in less proficient models. Furthermore, successful models appear to process more challenging arithmetic tasks functionally, suggesting reasoning capabilities beyond factual recall.