Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing task vectors (TVs) are typically extracted post-hoc from LLM outputs or hidden states, resulting in cumbersome, opaque, and non-differentiable procedures. Method: We propose Learned Task Vectors (LTVs)—end-to-end trainable, parameterized task representations that can be flexibly injected at arbitrary Transformer layers, token positions, and prompt contexts. Leveraging OV circuit analysis and linear dynamical modeling, we characterize how LTVs modulate information flow within attention heads via subspace rotation and scaling. Controlled intervention experiments validate their mechanistic role. Results: On multi-task benchmarks, LTVs significantly outperform conventional extracted TVs in accuracy. Moreover, they provide the first interpretable, mechanistic account of TV propagation—revealing how task-specific signals propagate through attention subspaces—thereby unifying performance gains with computational traceability and model transparency.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of "key heads" most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.

Problem

Research questions and friction points this paper is trying to address.

Directly training learned task vectors improves accuracy and flexibility

Investigating mechanistic role of task vectors in attention circuits

Analyzing linear propagation of task vectors despite Transformer nonlinearities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly training learned task vectors for LLMs

Using attention-head OV circuits to steer predictions

Linear propagation of task vectors despite Transformer nonlinearities

🔎 Similar Papers

Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning