Conceptors for Semantic Steering

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
📝 Abstract
Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.
Problem

Research questions and friction points this paper is trying to address.

semantic steering
concept representation
activation geometry
multidimensional subspace
LLM control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conceptors
Semantic Steering
Multidimensional Subspace
Boolean Algebra
Activation-based Control
🔎 Similar Papers