🤖 AI Summary
Existing implicit in-context learning methods rely on task-specific displacement vectors injected into the residual stream, suffering from opaque architectural mechanisms and poor generalization. This paper proposes a **context routing mechanism**, the first to model transferable, structured patterns directly at the attention logits layer: a learnable, input-conditioned router dynamically modulates attention weights, while structural direction extraction in the residual stream enables zero-cost, fine-tuning-free few-shot learning. The method requires neither task alignment nor additional training, supporting single-model training and multi-task reuse. Evaluated on 12 cross-domain real-world datasets, it significantly outperforms prior implicit approaches—especially on out-of-domain tasks—demonstrating strong generalization and revealing an effective pathway for internalizing in-context learning structure at the attention level.
📝 Abstract
Implicit in-context learning (ICL) has newly emerged as a promising paradigm that simulates ICL behaviors in the representation space of Large Language Models (LLMs), aiming to attain few-shot performance at zero-shot cost. However, existing approaches largely rely on injecting shift vectors into residual flows, which are typically constructed from labeled demonstrations or task-specific alignment. Such designs fall short of utilizing the structural mechanisms underlying ICL and suffer from limited generalizability. To address this, we propose In-Context Routing (ICR), a novel implicit ICL method that internalizes generalizable ICL patterns at the attention logits level. It extracts reusable structural directions that emerge during ICL and employs a learnable input-conditioned router to modulate attention logits accordingly, enabling a train-once-and-reuse framework. We evaluate ICR on 12 real-world datasets spanning diverse domains and multiple LLMs. The results show that ICR consistently outperforms prior implicit ICL methods that require task-specific retrieval or training, while demonstrating robust generalization to out-of-domain tasks where existing methods struggle. These findings position ICR to push the boundary of ICL's practical value.