One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the persistent challenges of hallucination and safety in vision-language models (VLMs) by proposing OSGA, a novel framework that introduces the first single-sample, universal steering vector aligned with semantic intent. OSGA generates an input-agnostic guidance signal from just one example, which—when injected into specific layers during inference—effectively mitigates hallucinations and enhances model safety without altering model parameters. The approach integrates variance-aware data selection, a contrastive learning objective, and generative anchor regularization to substantially reduce deployment overhead. Experimental results demonstrate that a single OSGA vector consistently improves safety and reliability across multiple benchmarks, with negligible computational cost.

Technology Category

Application Category

📝 Abstract
Vision Language Models (VLMs) achieve strong performance on multimodal tasks but still suffer from hallucination and safety-related failures that persist even at scale. Steering offers a lightweight technique to improve model performance. However, steering, whether input-dependent or input-independent, achieves a meaningful trade-off between efficiency and effectiveness. In this work, we observe that steering vectors can generalize across inputs when tasks share aligned semantic intent. Based on this insight, we propose \textbf{OSGA} (\textbf{O}ne-shot \textbf{S}teering with \textbf{G}enerative \textbf{A}nchor), an input-independent framework that improves model performance with a single optimization instance. OSGA first selects an informative sample via a variance-based data selection strategy and learns a single steering vector with a contrastive objective with generative anchor regularization. The resulting vector can be universally applied at a certain layer during inference time without modifying model parameters. Experiments across multiple benchmarks show that a single OSGA-optimized steering vector consistently improves hallucination mitigation and safety enhancement with negligible overhead, highlighting one-shot steering as a practical and scalable solution for reliable VLMs.
Problem

Research questions and friction points this paper is trying to address.

hallucination
safety
Vision Language Models
steering
reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

one-shot steering
steering vector
hallucination mitigation
vision-language models
generative anchor
🔎 Similar Papers
No similar papers found.