Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing in-context learning (ICL) relies on hand-crafted input-output exemplars, suffering from low efficiency and limited generalization. This paper proposes a task-embedding soft injection method that shifts task conditioning from the prompt space into the model’s activation space, enabling task execution without explicit demonstrations. Methodologically, it introduces soft mixing of attention head activations with pre-optimized parameters to enable task-embedding reuse; reveals the task-specific functionality—and cross-task transferability—of individual attention heads; and constructs task embeddings via few-shot prompts, modulating activations in attention layers using learnable soft head selection parameters. Experiments across 57 tasks and 12 large language models show consistent gains: the method outperforms 10-shot ICL by 10.1–13.9% on average, while substantially reducing memory footprint and inference overhead.

Technology Category

Application Category

📝 Abstract

In-Context Learning (ICL) enables Large Language Models (LLMs) to perform tasks by conditioning on input-output examples in the prompt, without requiring any update in model parameters. While widely adopted, it remains unclear whether prompting with multiple examples is the most effective and efficient way to convey task information. In this work, we propose Soft Injection of task embeddings. The task embeddings are constructed only once using few-shot ICL prompts and repeatedly used during inference. Soft injection is performed by softly mixing task embeddings with attention head activations using pre-optimized mixing parameters, referred to as soft head-selection parameters. This method not only allows a desired task to be performed without in-prompt demonstrations but also significantly outperforms existing ICL approaches while reducing memory usage and compute cost at inference time. An extensive evaluation is performed across 57 tasks and 12 LLMs, spanning four model families of sizes from 4B to 70B. Averaged across 57 tasks, our method outperforms 10-shot ICL by 10.1%-13.9% across 12 LLMs. Additional analyses show that our method also serves as an insightful tool for analyzing task-relevant roles of attention heads, revealing that task-relevant head positions selected by our method transfer across similar tasks but not across dissimilar ones -- underscoring the task-specific nature of head functionality. Our soft injection method opens a new paradigm for reducing prompt length and improving task performance by shifting task conditioning from the prompt space to the activation space.

Problem

Research questions and friction points this paper is trying to address.

Improves task performance without in-prompt demonstrations

Reduces memory usage and compute cost during inference

Shifts task conditioning from prompt space to activation space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft injection of task embeddings into attention heads

Constructs task embeddings from few-shot ICL prompts

Reduces memory usage and compute cost significantly

🔎 Similar Papers

No similar papers found.