SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Multimodal large language models (MLLMs) suffer from language priors and degraded visual representations, leading to object hallucination. To address this, we propose a training-free feature-guided mechanism: first, we design a binary object-existence question-answering probe to precisely localize critical visual understanding features in the sparse autoencoder (SAE) latent space; second, we dynamically modulate the model’s attention via these features to suppress hallucination generation. The method generalizes across models and layers without fine-tuning. It achieves a 10.0-point improvement on CHAIR_S and significant gains on POPE and MMHal-Bench, demonstrating effectiveness and robustness. Our core contributions are: (i) the first interpretable, probe-based hallucination detection framework specifically for MLLMs; and (ii) a zero-shot visual enhancement paradigm guided by latent features. This approach bridges interpretability and hallucination mitigation without architectural or parametric modifications.

Technology Category

Application Category

📝 Abstract

Although Multimodal Large Language Models (MLLMs) have advanced substantially, they remain vulnerable to object hallucination caused by language priors and visual information loss. To address this, we propose SAVE (Sparse Autoencoder-Driven Visual Information Enhancement), a framework that mitigates hallucination by steering the model along Sparse Autoencoder (SAE) latent features. A binary object-presence question-answering probe identifies the SAE features most indicative of the model's visual information processing, referred to as visual understanding features. Steering the model along these identified features reinforces grounded visual understanding and effectively reduces hallucination. With its simple design, SAVE outperforms state-of-the-art training-free methods on standard benchmarks, achieving a 10%p improvement in CHAIR_S and consistent gains on POPE and MMHal-Bench. Extensive evaluations across multiple models and layers confirm the robustness and generalizability of our approach. Further analysis reveals that steering along visual understanding features suppresses the generation of uncertain object tokens and increases attention to image tokens, mitigating hallucination. Code is released at https://github.com/wiarae/SAVE.

Problem

Research questions and friction points this paper is trying to address.

Mitigates object hallucination in multimodal large language models

Enhances visual information using sparse autoencoder latent features

Reduces hallucination by reinforcing grounded visual understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Autoencoder latent features steer model

Binary probe identifies visual understanding features

Steering suppresses uncertain tokens, increases image attention

🔎 Similar Papers

No similar papers found.