Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions

📅 2026-02-22

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study investigates how Llama-3, after performing cross-token information routing, can produce the result of three-digit addition relying solely on the final input token. Through causal residual patching and cumulative attention ablation, the authors identify a post-routing boundary around layer 17, beyond which the addition output is governed by the last token and structured numerical directions residing in a low-rank subspace. The work presents the first evidence that numerical directions across different carry contexts are related via approximately orthogonal rotational mappings, enabling precise counterfactual digit editing through rotation-based interventions. Experiments demonstrate that late self-attention layers can be omitted without performance loss, and that cross-context editing capability can be restored via low-rank Procrustes alignment and directional manipulation. Negative control experiments fail to reproduce this effect, confirming the specificity of the identified mechanism.

Technology Category

Application Category

📝 Abstract

We study three-digit addition in Meta-Llama-3-8B (base) under a one-token readout to characterize how arithmetic answers are finalized after cross-token routing becomes causally irrelevant. Causal residual patching and cumulative attention ablations localize a sharp boundary near layer~17: beyond it, the decoded sum is controlled almost entirely by the last input token and late-layer self-attention is largely dispensable. In this post-routing regime, digit(-sum) direction dictionaries vary with a next-higher-digit context but are well-related by an approximately orthogonal map inside a shared low-rank subspace (low-rank Procrustes alignment). Causal digit editing matches this geometry: naive cross-context transfer fails, while rotating directions through the learned map restores strict counterfactual edits; negative controls do not recover.

Problem

Research questions and friction points this paper is trying to address.

post-routing arithmetic

last-token readout

digit direction

low-rank subspace

causal editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

post-routing arithmetic

low-rank Procrustes alignment

digit direction geometry