Which Attention Heads Matter for In-Context Learning?

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work investigates the core mechanisms underlying in-context learning (ICL) in large language models (LLMs), specifically examining the relative roles of induction heads and functional vector (FV) heads. Using cross-model attention head localization, targeted ablation, and activation pattern analysis across 12 prominent models—including LLaMA and Qwen—we provide the first empirical evidence that FV heads are the primary drivers of few-shot ICL, consistently outperforming induction heads in contribution, especially in larger models. Moreover, approximately 60% of FV heads undergo a dynamic functional transition—from induction-like behavior to FV functionality—mid-training. Based on these findings, we propose a novel training dynamics hypothesis: *FV heads evolve from induction heads*, offering a unifying account of their relationship. We further validate this hypothesis through task-wise latent-space separability analysis, confirming its theoretical plausibility.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we discover that few-shot ICL performance depends primarily on FV heads, especially in larger models. In addition, we uncover that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism that ultimately drives ICL.

Problem

Research questions and friction points this paper is trying to address.

Identify key attention heads

Compare induction vs function vector heads

Determine mechanism driving in-context learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Focuses on function vector heads

Compares induction and FV heads

Links induction to FV transition

🔎 Similar Papers

Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning