🤖 AI Summary
Large Vision-Language Models (LVLMs) suffer from object hallucination (OH), where generated text contradicts image content, and their underlying visual decision-making mechanisms remain poorly understood. To address this, we propose VaLSe—a Vision-aware Latent-Space guidance framework—introducing the first “explain-and-mitigate” two-stage paradigm for OH. First, a gradient-weighted variant of class activation mapping generates fine-grained visual contribution maps, attributing cross-modal attention and localizing image regions influencing output tokens. Second, latent representations are calibrated adversarially in the hidden space to suppress hallucinatory outputs. VaLSe is the first interpretability-driven latent-space intervention method specifically designed for OH. Evaluated across multiple benchmarks, it reduces OH rates by 23.6% on average, delivers high-fidelity visual explanations, uncovers critical limitations in existing evaluation metrics, and motivates the development of new, visually aligned and interpretable assessment standards.
📝 Abstract
Large Vision-Language Models (LVLMs) have achieved remarkable success but continue to struggle with object hallucination (OH), generating outputs inconsistent with visual inputs. While previous work has proposed methods to reduce OH, the visual decision-making mechanisms that lead to hallucinations remain poorly understood. In this paper, we propose VaLSe, a Vision-aware Latent Steering framework that adopts an interpretation-then-mitigation strategy to address OH in LVLMs. By tackling dual challenges of modeling complex vision-language interactions and eliminating spurious activation artifacts, VaLSe can generate visual contribution maps that trace how specific visual inputs influence individual output tokens. These maps reveal the model's vision-aware focus regions, which are then used to perform latent space steering, realigning internal representations toward semantically relevant content and reducing hallucinated outputs. Extensive experiments demonstrate that VaLSe is a powerful interpretability tool and an effective method for enhancing model robustness against OH across multiple benchmarks. Furthermore, our analysis uncovers limitations in existing OH evaluation metrics, underscoring the need for more nuanced, interpretable, and visually grounded OH benchmarks in future work. Code is available at: https://github.com/Ziwei-Zheng/VaLSe.