MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-attribute behavioral control methods for large language models suffer from interference and trade-offs due to attribute coupling. To address this, we propose the Multi-Subspace Representation Guidance (MSRG) framework, which decouples distinct control attributes via orthogonal subspaces. MSRG introduces three key mechanisms: (i) a shared-specialized subspace decomposition, (ii) token-level semantic-sensitive intervention, and (iii) dynamic weight fusion—enabling fine-grained, low-interference collaborative control. Crucially, it avoids explicit attribute conflict without requiring hand-crafted constraints. Empirically, MSRG significantly outperforms state-of-the-art methods on instruction following, style transfer, and other multi-attribute tasks, while demonstrating strong generalization to unseen downstream tasks. Its core innovation lies in formulating attribute control as differentiable, dynamic, token-level interventions within orthogonal subspaces—thereby achieving both effective disentanglement and compositional flexibility.

Technology Category

Application Category

📝 Abstract
Activation steering offers a promising approach to controlling the behavior of Large Language Models by directly manipulating their internal activations. However, most existing methods struggle to jointly steer multiple attributes, often resulting in interference and undesirable trade-offs. To address this challenge, we propose Multi-Subspace Representation Steering (MSRS), a novel framework for effective multi-attribute steering via subspace representation fine-tuning. MSRS reduces inter-attribute interference by allocating orthogonal subspaces to each attribute, isolating their influence within the model's representation space. MSRS also incorporates a hybrid subspace composition strategy: it combines attribute-specific subspaces for unique steering directions with a shared subspace for common steering directions. A dynamic weighting function learns to efficiently integrate these components for precise control. During inference, MSRS introduces a token-level steering mechanism that dynamically identifies and intervenes on the most semantically relevant tokens, enabling fine-grained behavioral modulation. Experimental results show that MSRS significantly reduces attribute conflicts, surpasses existing methods across a range of attributes, and generalizes effectively to diverse downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

Control multiple attributes in LLMs without interference
Allocate orthogonal subspaces to isolate attribute influence
Enable fine-grained behavioral modulation via token-level steering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal subspaces reduce attribute interference
Hybrid subspace composition combines specific and shared directions
Token-level steering enables fine-grained behavioral modulation
🔎 Similar Papers
No similar papers found.
X
Xinyan Jiang
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
L
Lin Zhang
Provable Responsible AI and Data Analytics (PRADA) Lab
J
Jiayi Zhang
University of Copenhagen, Copenhagen, Denmark
Qingsong Yang
Qingsong Yang
University of Science and Technology of China, Hefei, China
Guimin Hu
Guimin Hu
University of Copenhagen
Multimodal LearningNatural Language ProcessingAffective ComputingHaptic Understanding
D
Di Wang
Provable Responsible AI and Data Analytics (PRADA) Lab
Lijie Hu
Lijie Hu
Assistant Professor, MBZUAI
Explainable AILLMDifferential Privacy