Fine-Grained Activation Steering: Steering Less, Achieving More

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing activation-based interventions in large language models typically operate at the block level, resulting in coarse-grained control, low efficiency, and significant interference due to the neglect of internal heterogeneity within activations. This work is the first to uncover the root cause of such heterogeneity and proposes a fine-grained intervention paradigm based on atomic units (AUs). By analyzing activation dynamics and selecting contrastive samples, the method identifies AUs that are most discriminative for model outputs and applies adaptive intervention strengths accordingly. Evaluated across multiple mainstream large language models and tasks, the approach substantially outperforms state-of-the-art baselines, achieving stronger behavioral steering with minimal intervention—demonstrating a highly efficient “less intervention, more effect” control capability.

Technology Category

Application Category

📝 Abstract
Activation steering has emerged as a cost-effective paradigm for modifying large language model (LLM) behaviors. Existing methods typically intervene at the block level, steering the bundled activations of selected attention heads, feedforward networks, or residual streams. However, we reveal that block-level activations are inherently heterogeneous, entangling beneficial, irrelevant, and harmful features, thereby rendering block-level steering coarse, inefficient, and intrusive. To investigate the root cause, we decompose block activations into fine-grained atomic unit (AU)-level activations, where each AU-level activation corresponds to a single dimension of the block activation, and each AU denotes a slice of the block weight matrix. Steering an AU-level activation is thus equivalent to steering its associated AU. Our theoretical and empirical analysis show that heterogeneity arises because different AUs or dimensions control distinct token distributions in LLM outputs. Hence, block-level steering inevitably moves helpful and harmful token directions together, which reduces efficiency. Restricting intervention to beneficial AUs yields more precise and effective steering. Building on this insight, we propose AUSteer, a simple and efficient method that operates at a finer granularity of the AU level. AUSteer first identifies discriminative AUs globally by computing activation momenta on contrastive samples. It then assigns adaptive steering strengths tailored to diverse inputs and selected AU activations. Comprehensive experiments on multiple LLMs and tasks show that AUSteer consistently surpasses advanced baselines while steering considerably fewer activations, demonstrating that steering less achieves more.
Problem

Research questions and friction points this paper is trying to address.

activation steering
block-level intervention
activation heterogeneity
fine-grained control
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-grained activation steering
atomic unit
activation heterogeneity
AUSteer
adaptive steering
🔎 Similar Papers
No similar papers found.
Zijian Feng
Zijian Feng
Nanyang Technological University
NLPLLMMLLM
T
Tianjiao Li
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
Zixiao Zhu
Zixiao Zhu
Nanyang Technological University
artificial intelligence
Hanzhang Zhou
Hanzhang Zhou
Nanyang Technological University
Large Language ModelsMechanistic InterpretabilityNatural Language Processing
J
Junlang Qian
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
Li Zhang
Li Zhang
Associate Professor, School of Software, Tsinghua University
Big DataWorkflowInformation System
J
Jia Jim Deryl Chua
Home Team Science and Technology Agency (HTX), Singapore
L
L. Mak
Home Team Science and Technology Agency (HTX), Singapore
G
Gee Wah Ng
Home Team Science and Technology Agency (HTX), Singapore
Kezhi Mao
Kezhi Mao
Nanyang Technological University
machine learningnatural language processingimage processinginformation fusion