Towards Generalizable Implicit In-Context Learning with Attention Routing

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing implicit in-context learning methods rely on task-specific displacement vectors injected into the residual stream, suffering from opaque architectural mechanisms and poor generalization. This paper proposes a **context routing mechanism**, the first to model transferable, structured patterns directly at the attention logits layer: a learnable, input-conditioned router dynamically modulates attention weights, while structural direction extraction in the residual stream enables zero-cost, fine-tuning-free few-shot learning. The method requires neither task alignment nor additional training, supporting single-model training and multi-task reuse. Evaluated on 12 cross-domain real-world datasets, it significantly outperforms prior implicit approaches—especially on out-of-domain tasks—demonstrating strong generalization and revealing an effective pathway for internalizing in-context learning structure at the attention level.

Technology Category

Application Category

📝 Abstract

Implicit in-context learning (ICL) has newly emerged as a promising paradigm that simulates ICL behaviors in the representation space of Large Language Models (LLMs), aiming to attain few-shot performance at zero-shot cost. However, existing approaches largely rely on injecting shift vectors into residual flows, which are typically constructed from labeled demonstrations or task-specific alignment. Such designs fall short of utilizing the structural mechanisms underlying ICL and suffer from limited generalizability. To address this, we propose In-Context Routing (ICR), a novel implicit ICL method that internalizes generalizable ICL patterns at the attention logits level. It extracts reusable structural directions that emerge during ICL and employs a learnable input-conditioned router to modulate attention logits accordingly, enabling a train-once-and-reuse framework. We evaluate ICR on 12 real-world datasets spanning diverse domains and multiple LLMs. The results show that ICR consistently outperforms prior implicit ICL methods that require task-specific retrieval or training, while demonstrating robust generalization to out-of-domain tasks where existing methods struggle. These findings position ICR to push the boundary of ICL's practical value.

Problem

Research questions and friction points this paper is trying to address.

Improving implicit in-context learning generalization beyond task-specific dependencies

Addressing limited generalizability in existing implicit ICL methods

Developing reusable structural patterns for attention mechanisms in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Routes attention logits via learnable input-conditioned router

Extracts reusable structural directions from ICL patterns

Enables train-once-and-reuse framework for generalization

🔎 Similar Papers

No similar papers found.

Authors to Follow