🤖 AI Summary
This work investigates how Transformer models perform in-context learning (ICL) without weight updates—specifically, how they form novel associations from input context—and focuses on the emergence mechanism of *induction heads*.
Method: Using a simplified two-layer Transformer, the study integrates dynamical systems modeling, low-rank optimization, and dynamic dimensionality reduction. It introduces a minimal task formulation and analyzes training trajectories in parameter space.
Contribution/Results: The emergence of induction heads is governed by a mere 3-dimensional parameter subspace; the training dynamics exhibit a 19-dimensional invariant manifold; and the induction head emergence time scales quadratically with context length—a relationship rigorously established and empirically validated. Both theory and experiments confirm that parameter evolution is confined to a low-dimensional subspace and that emergence timing is precisely predictable. This work provides the first mathematically interpretable characterization—supported by empirical verification—of the mechanistic origin of induction heads in ICL.
📝 Abstract
Transformers have become the dominant architecture for natural language processing. Part of their success is owed to a remarkable capability known as in-context learning (ICL): they can acquire and apply novel associations solely from their input context, without any updates to their weights. In this work, we study the emergence of induction heads, a previously identified mechanism in two-layer transformers that is particularly important for in-context learning. We uncover a relatively simple and interpretable structure of the weight matrices implementing the induction head. We theoretically explain the origin of this structure using a minimal ICL task formulation and a modified transformer architecture. We give a formal proof that the training dynamics remain constrained to a 19-dimensional subspace of the parameter space. Empirically, we validate this constraint while observing that only 3 dimensions account for the emergence of an induction head. By further studying the training dynamics inside this 3-dimensional subspace, we find that the time until the emergence of an induction head follows a tight asymptotic bound that is quadratic in the input context length.