🤖 AI Summary
This work investigates the nature of task representation in large language models (LLMs) under in-context learning (ICL), challenging the “global task vector” hypothesis. We find that ICL does not rely on a single, unified task vector; instead, it encodes rule information from each demonstration as distinct **local rule vectors**, which are collaboratively and distributively aggregated at the output position. We first uncover this **distributed representation mechanism of rule vectors**, validated across synthetic and real-world tasks with multiple rule dependencies. Our methodology integrates causal mediation analysis, patching experiments, attention flow tracking, and vector space decomposition. Results demonstrate that rule vectors encode high-level abstractions and support accurate predictions, providing a unified, interpretable information-aggregation mechanism for ICL—thereby substantially enhancing behavioral interpretability.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable abilities, one of the most important being In-Context Learning (ICL). With ICL, LLMs can derive the underlying rule from a few demonstrations and provide answers that comply with the rule. Previous work hypothesized that the network creates a"task vector"in specific positions during ICL. Patching the"task vector"allows LLMs to achieve zero-shot performance similar to few-shot learning. However, we discover that such"task vectors"do not exist in tasks where the rule has to be defined through multiple demonstrations. Instead, the rule information provided by each demonstration is first transmitted to its answer position and forms its own rule vector. Importantly, all the rule vectors contribute to the output in a distributed manner. We further show that the rule vectors encode a high-level abstraction of rules extracted from the demonstrations. These results are further validated in a series of tasks that rely on rules dependent on multiple demonstrations. Our study provides novel insights into the mechanism underlying ICL in LLMs, demonstrating how ICL may be achieved through an information aggregation mechanism.