🤖 AI Summary
Existing activation steering methods rely on a single static direction, limiting their adaptability to diverse tasks and complex capability coordination. This work proposes a dynamic activation steering framework that, during inference, efficiently synthesizes task-specific steering vectors by linearly combining basis vectors within a predefined low-dimensional semantic prior subspace, using only a few examples and without any retraining. The approach substantially enhances adaptation flexibility and data efficiency, achieving an average performance gain of 8.2% across three large language models and nine tasks. Moreover, it improves the stability and interpretability of model control, enabling more precise and reliable steering of model behavior in varied contexts.
📝 Abstract
Activation steering has emerged as a promising approach for efficiently adapting large language models (LLMs) to downstream behaviors. However, most existing steering methods rely on a single static direction per task or concept, making them inflexible under task variation and inadequate for complex tasks that require multiple coordinated capabilities. To address this limitation, we propose STEER2ADAPT, a lightweight framework that adapts LLMs by composing steering vectors rather than learning new ones from scratch. In many domains (e.g., reasoning or safety), tasks share a small set of underlying concept dimensions. STEER2ADAPT captures these dimensions as a reusable, low-dimensional semantic prior subspace, and adapts to new tasks by dynamically discovering a linear combination of basis vectors from only a handful of examples. Experiments across 9 tasks and 3 models in both reasoning and safety domains demonstrate the effectiveness of STEER2ADAPT, achieving an average improvement of 8.2%. Extensive analyses further show that STEER2ADAPT is a data-efficient, stable, and transparent inference-time adaptation method for LLMs.