🤖 AI Summary
Existing linear activation steering methods apply a fixed intensity to all tokens, failing to adapt to varying input contexts and resulting in inconsistent control performance. This work proposes Context-aware Linear Activation Steering (CLAS), which introduces, for the first time, a dynamic steering intensity mechanism that adaptively adjusts the strength based on the input context without requiring additional trainable parameters. Extensive experiments across 11 benchmark tasks and 4 model families demonstrate that CLAS significantly outperforms standard linear steering approaches and, in few-shot settings, matches or even surpasses state-of-the-art parameter-efficient methods such as ReFT and LoRA. These results highlight CLAS as a more precise, efficient, and data-efficient strategy for guiding model behavior.
📝 Abstract
Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input prompts. In this work, we introduce Contextual Linear Activation Steering (CLAS), a method that dynamically adapts linear activation steering to context-dependent steering strengths. Across eleven steering benchmarks and four model families, it consistently outperforms standard linear activation steering and matches or exceeds the performance of ReFT and LoRA in settings with limited labeled data. We therefore propose CLAS as a scalable, interpretable, and accurate method for specializing and steering large language models.