🤖 AI Summary
The mechanistic basis of implicit mental arithmetic in large language models (LLMs)—specifically, whether causal self-attention plus MLPs can support full-sequence collaborative computation—remains a black-box problem.
Method: We propose and validate a novel subnetwork architecture, *All-for-One*, characterized by highly concentrated computation at the final token in deep layers and efficient information aggregation through a sparse set of intermediate layers. We introduce two diagnostic techniques: Context-Aware Mean Ablation (CAMA) and Attention-Based Probing (ABP), enabling precise identification and disruption of critical information flow paths.
Contribution/Results: All-for-One exhibits strong cross-model transferability and robustness to input format variations. Empirical evaluation across diverse LLMs and arithmetic expressions confirms it as both efficient and necessary for implicit mental arithmetic; targeted ablation yields significantly higher accuracy than baseline methods. Our findings uncover a fundamental organizational principle underlying LLMs’ implicit arithmetic reasoning.
📝 Abstract
Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and multilayer perceptron layers allows every token to access and compute information based on all preceding tokens. In practice, to what extent are such operations present? In this paper, on mental math tasks (i.e., direct math calculation via next-token prediction without explicit reasoning), we investigate this question in three steps: inhibiting input-specific token computations in the initial layers, restricting the routes of information transfer across token positions in the next few layers, and forcing all computation to happen at the last token in the remaining layers. With two proposed techniques, Context-Aware Mean Ablation (CAMA) and Attention-Based Peeking (ABP), we identify an All-for-One subgraph (AF1) with high accuracy on a wide variety of mental math tasks, where meaningful computation occurs very late (in terms of layer depth) and only at the last token, which receives information of other tokens in few specific middle layers. Experiments on a variety of models and arithmetic expressions show that this subgraph is sufficient and necessary for high model performance, transfers across different models, and works on a variety of input styles. Ablations on different CAMA and ABP alternatives reveal their unique advantages over other methods, which may be of independent interest.