VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing

๐Ÿ“… 2026-04-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

194K/year
๐Ÿค– AI Summary
This work addresses the critical issue of object hallucination in large vision-language models (LVLMs), which undermines reliability in high-stakes applications such as medical imaging and autonomous driving. The authors propose Visual Contrastive Editing (VCE), a zero-cost, label-free post-hoc intervention that precisely identifies and suppresses hallucinatory subspaces by analyzing activation patterns under visual perturbations and applying singular value decomposition (SVD). Notably, VCE operates without model fine-tuning or additional annotations. Experimental results demonstrate that VCE significantly reduces object hallucination rates across multiple benchmarks while preserving the modelโ€™s original inference efficiency, making it well-suited for deployment in resource-constrained real-world settings.

Technology Category

Application Category

๐Ÿ“ Abstract
Large vision-language models (LVLMs) frequently suffer from Object Hallucination (OH), wherein they generate descriptions containing objects that are not actually present in the input image. This phenomenon is particularly problematic in real-world applications such as medical imaging and autonomous driving, where accuracy is critical. Recent studies suggest that the hallucination problem may stem from language priors: biases learned during pretraining that cause LVLMs to generate words based on their statistical co-occurrence. To mitigate this problem, we propose Visual Contrastive Editing (VCE), a novel post-hoc method that identifies and suppresses hallucinatory tendencies by analyzing the model's response to contrastive visual perturbations. Using Singular Value Decomposition (SVD), we decompose the model's activation patterns to isolate hallucination subspaces and apply targeted parameter edits to attenuate its influence. Unlike existing approaches that require fine-tuning or labeled data, VCE operates as a label-free intervention, making it both scalable and practical for deployment in resource-constrained settings. Experimental results demonstrate that VCE effectively reduces object hallucination across multiple benchmarks while maintaining the model's original computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Object Hallucination
Large Vision-Language Models
Language Priors
Visual-Language Understanding
Model Hallucination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Contrastive Editing
Object Hallucination
Singular Value Decomposition
Language Priors
Zero-cost Mitigation
๐Ÿ”Ž Similar Papers