Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper addresses the lack of mechanistic understanding of in-context learning (ICL) in large language models. We first decouple ICL into two localized cognitive phases—task recognition (TR) and task learning (TL)—and propose the Task Subspace Logit Attribution (TSLA) framework. Leveraging attention-head-level geometric analysis, attribution quantification, and hidden-state steering experiments, we systematically identify distinct attention heads specialized for TR and TL, empirically validating their functional independence and synergy. TSLA unifies prior observations—including induction heads and task vectors—within a single, interpretable, decomposable, and intervenable ICL mechanism model. It enables mechanism-level modeling and controllable intervention across multiple tasks, significantly enhancing the transparency and controllability of ICL.

Technology Category

Application Category

📝 Abstract

We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.

Problem

Research questions and friction points this paper is trying to address.

Identifying attention heads for task recognition and learning in ICL

Demonstrating distinct roles of heads in aligning and rotating hidden states

Providing a unified interpretable framework for ICL mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

TSLA framework identifies TR and TL attention heads

TR heads align hidden states with task subspace

TL heads rotate hidden states toward correct labels

🔎 Similar Papers

Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons