Conceptors for Semantic Steering

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

190K/year

📝 Abstract

Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.

Problem

Research questions and friction points this paper is trying to address.

semantic steering

concept representation

activation geometry

multidimensional subspace

LLM control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conceptors

Semantic Steering

Multidimensional Subspace