Contrastive Conceptor Activation Steering (COAST): Unlocking Vision-Language-Action Models through Hidden States

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
Existing Vision-Language-Action (VLA) models exhibit fragility in real-world robotic tasks, often failing to reliably execute even simple operations. This work proposes Contrastive Conceptor Activation Steering (COAST), a method that—without additional training—identifies task-critical latent subspaces from a small set of successful and failed trajectories and steers the VLA model’s internal representations toward task-specific success subspaces during inference. COAST introduces Conceptor mechanisms into VLA systems for the first time, integrating linear Conceptor operators, contrastive learning, and residual stream steering to reveal that failure modes are largely shared across tasks, whereas successful representations are task-specific. Evaluated on three distinct neural policy architectures, COAST improves task success rates by over 20% in simulation and over 40% in real-world settings, while also enabling zero-shot cross-task transfer.
📝 Abstract
Vision-Language-Action (VLA) models leverage powerful perceptual priors from web-scale Vision-Language Model (VLM) pre-training, yet they remain surprisingly brittle in practice, frequently failing at simple robotic tasks. To mitigate this, we propose Contrastive Conceptor Activation Steering (COAST). COAST builds on the notion of a "conceptor", a linear operator that soft-projects data into the principal components of a target distribution. COAST uses conceptors to identify success-critical subspaces for a target robotic task from a few examples of success and failure rollouts. At inference time, it steers VLA latents into these identified success subspaces to improve task outcomes. Across three architecturally distinct neural policies (flow-matching VLA, autoregressive VLA, and Diffusion Policy), COAST improves absolute mean simulation and real-robot task success rate by over 20 and 40% respectively. The activation subspace geometry reveals that failure modes share substantial structure across tasks while success representations remain largely task-specific. When tasks share similar failure modes, this structure enables previously fitted conceptors to improve performance on new tasks without refitting. Ultimately, our results suggest that current VLAs retain substantial task-relevant knowledge in their latent representations, and that the action expert's decoding bottleneck could be mitigated by steering its residual stream toward task-relevant subspaces. COAST provides a lightweight, training-free path to unlocking these latent capabilities by steering the model towards its own "success" distributions.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
robotic task failure
latent representation brittleness
action decoding bottleneck
task success generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conceptor
Activation Steering
Vision-Language-Action Models
Latent Subspace
Zero-shot Transfer
🔎 Similar Papers
2024-06-09Annual Meeting of the Association for Computational LinguisticsCitations: 13