Understanding In-context Learning of Addition via Activation Subspaces

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work investigates the mechanistic basis of few-shot integer addition in context learning (ICL) with Transformer models. Using Llama-3-8B, we conduct attribution analysis, layer-wise signal tracing, and activation subspace decomposition. We identify— for the first time—that only three attention heads dominate generalization performance. Crucially, these heads operate within a six-dimensional low-dimensional subspace: four dimensions encode unit digits, and two encode digit magnitude, enabling semantic separation. We further propose a “self-correcting” mechanism that explains how posterior in-context examples dynamically suppress prior erroneous predictions. Experiments across integers k ∈ [−9, 9] achieve >95% accuracy; key heads are precisely localized, and the subspace structure and self-correction mechanism are empirically validated. Our findings establish a new paradigm for interpretable, modular modeling of ICL, advancing mechanistic understanding of arithmetic reasoning in large language models.

Technology Category

Application Category

📝 Abstract

To perform in-context learning, language models must extract signals from individual few-shot examples, aggregate these into a learned prediction rule, and then apply this rule to new examples. How is this implemented in the forward pass of modern transformer models? To study this, we consider a structured family of few-shot learning tasks for which the true prediction rule is to add an integer $k$ to the input. We find that Llama-3-8B attains high accuracy on this task for a range of $k$, and localize its few-shot ability to just three attention heads via a novel optimization approach. We further show the extracted signals lie in a six-dimensional subspace, where four of the dimensions track the unit digit and the other two dimensions track overall magnitude. We finally examine how these heads extract information from individual few-shot examples, identifying a self-correction mechanism in which mistakes from earlier examples are suppressed by later examples. Our results demonstrate how tracking low-dimensional subspaces across a forward pass can provide insight into fine-grained computational structures.

Problem

Research questions and friction points this paper is trying to address.

How transformer models implement in-context learning of addition

Localizing few-shot ability to specific attention heads in Llama-3-8B

Identifying low-dimensional subspaces for signal extraction in few-shot tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Localizes few-shot ability to three attention heads

Identifies six-dimensional subspace for signal extraction

Reveals self-correction mechanism in few-shot learning

🔎 Similar Papers

No similar papers found.