Contextual Linear Activation Steering of Language Models

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing linear activation steering methods apply a fixed intensity to all tokens, failing to adapt to varying input contexts and resulting in inconsistent control performance. This work proposes Context-aware Linear Activation Steering (CLAS), which introduces, for the first time, a dynamic steering intensity mechanism that adaptively adjusts the strength based on the input context without requiring additional trainable parameters. Extensive experiments across 11 benchmark tasks and 4 model families demonstrate that CLAS significantly outperforms standard linear steering approaches and, in few-shot settings, matches or even surpasses state-of-the-art parameter-efficient methods such as ReFT and LoRA. These results highlight CLAS as a more precise, efficient, and data-efficient strategy for guiding model behavior.

Technology Category

Application Category

📝 Abstract

Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input prompts. In this work, we introduce Contextual Linear Activation Steering (CLAS), a method that dynamically adapts linear activation steering to context-dependent steering strengths. Across eleven steering benchmarks and four model families, it consistently outperforms standard linear activation steering and matches or exceeds the performance of ReFT and LoRA in settings with limited labeled data. We therefore propose CLAS as a scalable, interpretable, and accurate method for specializing and steering large language models.

Problem

Research questions and friction points this paper is trying to address.

linear activation steering

context-dependent steering

language model specialization

steering consistency

limited labeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual Linear Activation Steering

dynamic steering strength

large language models