Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing manifold-guided methods rely on class labels and predefined boundaries, limiting their applicability in unsupervised settings or scenarios with complex structures. This work proposes a unified Riemannian geometric framework that reformulates language model steering as a geodesic optimization problem in activation space. By pulling back the Hellinger distance from the output space to define a Riemannian metric, the approach enables geometry-aware guidance without requiring class labels or topological priors. The framework subsumes both linear and spline-based steering as special cases of geodesics under specific metrics. Evaluated on the standard four-task language model arithmetic benchmark, the method consistently generates trajectories aligned with target classes and significantly outperforms baseline approaches in low-dimensional output spaces, producing paths that better reflect natural behavioral patterns.

📝 Abstract

Steering a language model - intervening on its internal activations to change downstream behaviour - has recently expanded beyond linear interpolation to nonlinear methods such as angular and kernelized steering, which define intervention transformations without learning an explicit geometry over paths in activation space. Freshly introduced geometry-aware manifold methods do learn such a geometry, but require labelled class centroids together with prescribed cyclic or sequential structure. These assumptions restrict where manifold steering can be applied, since existing constructions require labelled centroids and compatible boundary conditions. We recast manifold steering more broadly as \textbf{Riemannian geodesic computation} on activation space, recovering linear and labelled-spline steering as geodesics under particular choices of metric. A principled metric within this framework is the output-space Hellinger distance pulled back to activations; we approximate this with a learned encoder trained on output distances over a small concept-token schema - no per-prompt labels, no topology prior, and no per-task curve fitting. Empirically, the method reliably drives the model onto the target class across all tasks in a standard four-task language-model arithmetic benchmark, while following more behaviourally natural trajectories than baselines on smaller output spaces. We thereby provide a unified Riemannian framework for manifold steering together with a schema-supervised, label-free instantiation that operates without labelled centroids or prescribed boundary conditions.

Problem

Research questions and friction points this paper is trying to address.

manifold steering

label-free

Riemannian geometry

language model intervention

activation space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Riemannian manifold

geodesic steering

label-free intervention