Seeing It or Not? Interpretable Vision-aware Latent Steering to Mitigate Object Hallucinations

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large Vision-Language Models (LVLMs) suffer from object hallucination (OH), where generated text contradicts image content, and their underlying visual decision-making mechanisms remain poorly understood. To address this, we propose VaLSe—a Vision-aware Latent-Space guidance framework—introducing the first “explain-and-mitigate” two-stage paradigm for OH. First, a gradient-weighted variant of class activation mapping generates fine-grained visual contribution maps, attributing cross-modal attention and localizing image regions influencing output tokens. Second, latent representations are calibrated adversarially in the hidden space to suppress hallucinatory outputs. VaLSe is the first interpretability-driven latent-space intervention method specifically designed for OH. Evaluated across multiple benchmarks, it reduces OH rates by 23.6% on average, delivers high-fidelity visual explanations, uncovers critical limitations in existing evaluation metrics, and motivates the development of new, visually aligned and interpretable assessment standards.

Technology Category

Application Category

📝 Abstract
Large Vision-Language Models (LVLMs) have achieved remarkable success but continue to struggle with object hallucination (OH), generating outputs inconsistent with visual inputs. While previous work has proposed methods to reduce OH, the visual decision-making mechanisms that lead to hallucinations remain poorly understood. In this paper, we propose VaLSe, a Vision-aware Latent Steering framework that adopts an interpretation-then-mitigation strategy to address OH in LVLMs. By tackling dual challenges of modeling complex vision-language interactions and eliminating spurious activation artifacts, VaLSe can generate visual contribution maps that trace how specific visual inputs influence individual output tokens. These maps reveal the model's vision-aware focus regions, which are then used to perform latent space steering, realigning internal representations toward semantically relevant content and reducing hallucinated outputs. Extensive experiments demonstrate that VaLSe is a powerful interpretability tool and an effective method for enhancing model robustness against OH across multiple benchmarks. Furthermore, our analysis uncovers limitations in existing OH evaluation metrics, underscoring the need for more nuanced, interpretable, and visually grounded OH benchmarks in future work. Code is available at: https://github.com/Ziwei-Zheng/VaLSe.
Problem

Research questions and friction points this paper is trying to address.

Reducing object hallucination in Large Vision-Language Models
Understanding visual decision-making mechanisms causing hallucinations
Improving interpretability and robustness of LVLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-aware Latent Steering framework
Interpretation-then-mitigation strategy
Generates visual contribution maps
🔎 Similar Papers
No similar papers found.
B
Boxu Chen
Xi’an Jiaotong University
Ziwei Zheng
Ziwei Zheng
Xi'an Jiaotong University
Dynamic Neural Network
L
Le Yang
Xi’an Jiaotong University
Z
Zeyu Geng
Xi’an Jiaotong University
Zhengyu Zhao
Zhengyu Zhao
Xi'an Jiaotong University, China
Adversarial Machine LearningComputer Vision
C
Chenhao Lin
Xi’an Jiaotong University
C
Chao Shen
Xi’an Jiaotong University