Beyond Linear Steering: Unified Multi-Attribute Control for Language Models

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Controlling multiple behavioral attributes simultaneously in large language model (LLM) inference remains challenging due to restrictive linear intervention assumptions, inter-attribute interference, and the need to store separate direction vectors per attribute. Method: We propose K-Steering—a non-linear, fine-tuning-free framework that replaces additive linear interventions with a single multi-label classifier. It dynamically generates composite intervention directions in real time by leveraging gradients of hidden-layer activations. Contributions/Results: We introduce two novel benchmarks—ToneBank and DebateMix—evaluated via both activation-based classifiers and LLM-based adjudication. On three mainstream LLMs, K-Steering significantly outperforms linear baselines in multi-attribute joint control, achieving state-of-the-art performance. The framework demonstrates strong generalization, precise controllability, and practical engineering viability.

Technology Category

Application Category

📝 Abstract
Controlling multiple behavioral attributes in large language models (LLMs) at inference time is a challenging problem due to interference between attributes and the limitations of linear steering methods, which assume additive behavior in activation space and require per-attribute tuning. We introduce K-Steering, a unified and flexible approach that trains a single non-linear multi-label classifier on hidden activations and computes intervention directions via gradients at inference time. This avoids linearity assumptions, removes the need for storing and tuning separate attribute vectors, and allows dynamic composition of behaviors without retraining. To evaluate our method, we propose two new benchmarks, ToneBank and DebateMix, targeting compositional behavioral control. Empirical results across 3 model families, validated by both activation-based classifiers and LLM-based judges, demonstrate that K-Steering outperforms strong baselines in accurately steering multiple behaviors.
Problem

Research questions and friction points this paper is trying to address.

Control multiple behavioral attributes in LLMs
Overcome interference between attributes and linear limitations
Enable dynamic behavior composition without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-linear multi-label classifier for hidden activations
Gradient-based intervention directions at inference
Dynamic behavior composition without retraining
🔎 Similar Papers
No similar papers found.