How Few-Shot Examples Add Up: A Causal Decomposition of Function Vectors in In-Context Learning

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
This study addresses the limited understanding of how few-shot prompting drives in-context learning through function vectors (FVs). The authors propose a causal decomposition framework that, for the first time, expresses FVs as linear combinations of example-specific subvectors, revealing a unified mechanism of additive superposition and context-adaptive reweighting. Leveraging techniques such as causal intervention, separation of Query-Key and Value pathways, and attention tracing, they empirically validate the effectiveness of this additive approximation across multiple tasks and models. Their analysis further identifies Query-Key alignment as a critical factor for enhancing FV quality, particularly in ambiguous scenarios where it substantially improves model performance.
📝 Abstract
In-context learning (ICL) excels at new tasks from minimal examples, yet we still lack a mechanistic explanation of how few-shot prompts shape a model's function vector (FV)--a causal activation direction that drives task behavior on the ICL query. Across tasks and models, an $n$-shot FV is well-approximated by a linear combination of example-level sub-FVs, suggesting additive and composable contributions from individual demonstrations. Beyond additivity, we show that models contextualize individual examples' representations based on prior examples to adaptively reweight which demonstrations dominate the FV: attention shifts toward examples that are more informative and less ambiguous under the context. Finally, a causal decomposition separates Query-Key routing from Value updates, finding that contextualization's most consistent contributions to FV quality arise from Query-Key alignment--particularly in ambiguous settings--while Value-mediated effects are more heterogeneous. Together, these results unify additive superposition with context-dependent attention reweighting into a mechanistic, testable account of how few-shot prompts implement tasks.
Problem

Research questions and friction points this paper is trying to address.

in-context learning
function vector
few-shot learning
causal mechanism
attention reweighting
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context learning
function vector
causal decomposition
attention reweighting
additive superposition