🤖 AI Summary
This work addresses a critical limitation in existing inference-time control methods for language models, which rely on activation vector addition and often disrupt the norm of hidden representations, leading to representation collapse and degraded generation quality. To overcome this, the authors propose a training-free, geometry-aware control method that rotates activation vectors along geodesics on the unit hypersphere to align with target directions, thereby strictly preserving vector norms. Additionally, they introduce a dynamic confidence gating mechanism based on input uncertainty to adaptively balance directional guidance with open-ended generation. By integrating spherical rotation and geodesic interpolation into inference-time control for the first time, the method achieves approximately 10% improvement over additive baselines on benchmarks such as TruthfulQA, COPA, and StoryCloze, while effectively maintaining generation diversity and fluency.
📝 Abstract
Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining. However, standard approaches typically rely on activation addition, a geometric operation that inevitably alters the magnitude of hidden representations. This raises concerns about representation collapse and degradation of open-ended generation capabilities. In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation. Rather than shifting activations with a fixed vector, our method rotates them along a geodesic toward a target direction, guiding the activation toward the target concept while preserving the integrity of the signal. To further enhance adaptivity, we incorporate a confidence gate that dynamically modulates steering strength based on input uncertainty. Extensive experiments across multiple-choice benchmarks demonstrate that Spherical Steering significantly outperforms addition-based baselines (notably by +10% on TruthfulQA, COPA, and Storycloze), while simultaneously maintaining the model's general open-ended generation quality. This work highlights the value of geometric consistency, suggesting that norm-preserving rotation is a robust and effective primitive for precise inference-time control.