Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing manifold-guided methods rely on class labels and predefined boundaries, limiting their applicability in unsupervised settings or scenarios with complex structures. This work proposes a unified Riemannian geometric framework that reformulates language model steering as a geodesic optimization problem in activation space. By pulling back the Hellinger distance from the output space to define a Riemannian metric, the approach enables geometry-aware guidance without requiring class labels or topological priors. The framework subsumes both linear and spline-based steering as special cases of geodesics under specific metrics. Evaluated on the standard four-task language model arithmetic benchmark, the method consistently generates trajectories aligned with target classes and significantly outperforms baseline approaches in low-dimensional output spaces, producing paths that better reflect natural behavioral patterns.
📝 Abstract
Steering a language model - intervening on its internal activations to change downstream behaviour - has recently expanded beyond linear interpolation to nonlinear methods such as angular and kernelized steering, which define intervention transformations without learning an explicit geometry over paths in activation space. Freshly introduced geometry-aware manifold methods do learn such a geometry, but require labelled class centroids together with prescribed cyclic or sequential structure. These assumptions restrict where manifold steering can be applied, since existing constructions require labelled centroids and compatible boundary conditions. We recast manifold steering more broadly as \textbf{Riemannian geodesic computation} on activation space, recovering linear and labelled-spline steering as geodesics under particular choices of metric. A principled metric within this framework is the output-space Hellinger distance pulled back to activations; we approximate this with a learned encoder trained on output distances over a small concept-token schema - no per-prompt labels, no topology prior, and no per-task curve fitting. Empirically, the method reliably drives the model onto the target class across all tasks in a standard four-task language-model arithmetic benchmark, while following more behaviourally natural trajectories than baselines on smaller output spaces. We thereby provide a unified Riemannian framework for manifold steering together with a schema-supervised, label-free instantiation that operates without labelled centroids or prescribed boundary conditions.
Problem

Research questions and friction points this paper is trying to address.

manifold steering
label-free
Riemannian geometry
language model intervention
activation space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Riemannian manifold
geodesic steering
label-free intervention
Hellinger distance
generative autoencoder
🔎 Similar Papers