DISCO: Disentangled Communication Steering for Large Language Models

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of output steering during large language model (LLM) inference. We propose DISCO, a novel steering method that—instead of injecting steering vectors into residual streams or attention output layers—first identifies and exploits the superior linear discriminability of query (Q) and value (V) representations, directly injecting steering vectors into the Q/V embedding spaces. This design decouples the conventional input-output coupled steering paradigm, enabling finer-grained and more interpretable interventions. Experiments on LLaMA-3.1-8B and Gemma-2-9B demonstrate that DISCO significantly outperforms baseline methods across multiple concept-steering tasks, achieving up to a 19.1% absolute improvement in steering accuracy. These results validate the effectiveness and generalizability of the Q/V space as a high-efficiency building block for controllable LLM inference.

Technology Category

Application Category

📝 Abstract
A variety of recent methods guide large language model outputs via the inference-time addition of steering vectors to residual-stream or attention-head representations. In contrast, we propose to inject steering vectors directly into the query and value representation spaces within attention heads. We provide evidence that a greater portion of these spaces exhibit high linear discriminability of concepts --a key property motivating the use of steering vectors-- than attention head outputs. We analytically characterize the effect of our method, which we term DISentangled COmmunication (DISCO) Steering, on attention head outputs. Our analysis reveals that DISCO disentangles a strong but underutilized baseline, steering attention inputs, which implicitly modifies queries and values in a rigid manner. In contrast, DISCO's direct modulation of these components enables more granular control. We find that DISCO achieves superior performance over a number of steering vector baselines across multiple datasets on LLaMA 3.1 8B and Gemma 2 9B, with steering efficacy scoring up to 19.1% higher than the runner-up. Our results support the conclusion that the query and value spaces are powerful building blocks for steering vector methods.
Problem

Research questions and friction points this paper is trying to address.

Improving control over large language model outputs during inference
Developing more granular steering of attention mechanisms in LLMs
Enhancing concept discriminability in query and value representation spaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Injecting steering vectors into query and value spaces
Directly modulates attention components for granular control
Achieves superior performance over other steering vector baselines
🔎 Similar Papers
No similar papers found.