Spherical Steering: Geometry-Aware Activation Rotation for Language Models

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing inference-time control methods for language models, which rely on activation vector addition and often disrupt the norm of hidden representations, leading to representation collapse and degraded generation quality. To overcome this, the authors propose a training-free, geometry-aware control method that rotates activation vectors along geodesics on the unit hypersphere to align with target directions, thereby strictly preserving vector norms. Additionally, they introduce a dynamic confidence gating mechanism based on input uncertainty to adaptively balance directional guidance with open-ended generation. By integrating spherical rotation and geodesic interpolation into inference-time control for the first time, the method achieves approximately 10% improvement over additive baselines on benchmarks such as TruthfulQA, COPA, and StoryCloze, while effectively maintaining generation diversity and fluency.

Technology Category

Application Category

📝 Abstract
Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining. However, standard approaches typically rely on activation addition, a geometric operation that inevitably alters the magnitude of hidden representations. This raises concerns about representation collapse and degradation of open-ended generation capabilities. In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation. Rather than shifting activations with a fixed vector, our method rotates them along a geodesic toward a target direction, guiding the activation toward the target concept while preserving the integrity of the signal. To further enhance adaptivity, we incorporate a confidence gate that dynamically modulates steering strength based on input uncertainty. Extensive experiments across multiple-choice benchmarks demonstrate that Spherical Steering significantly outperforms addition-based baselines (notably by +10% on TruthfulQA, COPA, and Storycloze), while simultaneously maintaining the model's general open-ended generation quality. This work highlights the value of geometric consistency, suggesting that norm-preserving rotation is a robust and effective primitive for precise inference-time control.
Problem

Research questions and friction points this paper is trying to address.

inference-time steering
activation addition
representation collapse
open-ended generation
language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spherical Steering
activation rotation
geometry-aware
inference-time control
norm-preserving
🔎 Similar Papers
No similar papers found.
Z
Zejia You
Rice University, Tufts University
C
Chunyuan Deng
Rice University
Hanjie Chen
Hanjie Chen
Rice University
Natural Language ProcessingInterpretable Machine Learning