SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of achieving fine-grained instruction following in vision-language models (VLMs) without modifying their pretrained weights. We propose a lightweight activation-guidance module that dynamically modulates semantic interactions between visual and linguistic modalities via dimension-wise activation modulation and cross-layer adaptive guidance—requiring no predefined intervention layers or static control vectors, and introducing only 0.14% additional parameters. Our method learns latent-space embeddings for target and counterfactual behaviors using a novel multimodal dataset, VNIA, specifically curated for training and evaluation. Experiments demonstrate that our approach significantly outperforms existing intervention methods on instruction-following and hallucination suppression tasks, while preserving performance on non-target tasks. This validates activation engineering as an effective paradigm for controllable multimodal reasoning.

Technology Category

Application Category

📝 Abstract
This work introduces SteerVLM, a lightweight steering module designed to guide Vision-Language Models (VLMs) towards outputs that better adhere to desired instructions. Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context. This allows for fine-grained, inference-time control over complex output semantics without modifying model weights while preserving performance on off-target tasks. Our steering module requires learning parameters equal to 0.14% of the original VLM's size. Our steering module gains model control through dimension-wise activation modulation and adaptive steering across layers without requiring pre-extracted static vectors or manual tuning of intervention points. Furthermore, we introduce VNIA (Visual Narrative Intent Alignment), a multimodal dataset specifically created to facilitate the development and evaluation of VLM steering techniques. Our method outperforms existing intervention techniques on steering and hallucination mitigation benchmarks for VLMs and proposes a robust solution for multimodal model control through activation engineering.
Problem

Research questions and friction points this paper is trying to address.

Lightweight module guides Vision-Language Models to follow instructions
Dynamically adjusts activations for fine-grained output control
Mitigates model hallucinations while preserving off-task performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight steering module for vision-language models
Dynamic activation adjustment using latent embeddings
Dimension-wise modulation without weight modification
🔎 Similar Papers
No similar papers found.
A
Anushka Sivakumar
Department of Computer Science, Virginia Tech
Andrew Zhang
Andrew Zhang
PhD student, Harvard & MIT
computer visionartificial intelligencehealthcaremedical devicesneuroscience
Z
Zaber Hakim
Department of Computer Science, Virginia Tech
Chris Thomas
Chris Thomas
Virginia Tech
Computer Vision