🤖 AI Summary
This paper addresses the lack of mechanistic understanding of in-context learning (ICL) in large language models. We first decouple ICL into two localized cognitive phases—task recognition (TR) and task learning (TL)—and propose the Task Subspace Logit Attribution (TSLA) framework. Leveraging attention-head-level geometric analysis, attribution quantification, and hidden-state steering experiments, we systematically identify distinct attention heads specialized for TR and TL, empirically validating their functional independence and synergy. TSLA unifies prior observations—including induction heads and task vectors—within a single, interpretable, decomposable, and intervenable ICL mechanism model. It enables mechanism-level modeling and controllable intervention across multiple tasks, significantly enhancing the transparency and controllability of ICL.
📝 Abstract
We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.