VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the persistent challenge of hallucination in vision-language models, where generated text often contradicts visual content—a problem inadequately mitigated by existing detection approaches. The paper introduces, for the first time, the variational information bottleneck framework to hallucination detection. By applying gradient-based causal analysis, the method identifies critical attention heads responsible for grounding textual output in visual input, extracts their discriminative signals, and filters out semantic noise during inference—without requiring model retraining. This inference-time intervention achieves state-of-the-art performance across multiple benchmarks, significantly enhancing the factual consistency and reliability of generated text with respect to the source image.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) have demonstrated remarkable progress in multimodal tasks, but remain susceptible to hallucinations, where generated text deviates from the underlying visual content. Existing hallucination detection methods primarily rely on output logits or external verification tools, often overlooking their internal mechanisms. In this work, we investigate the outputs of internal attention heads, postulating that specific heads carry the primary signals for truthful generation.However, directly probing these high-dimensional states is challenging due to the entanglement of visual-linguistic syntax and noise. To address this, we propose VIB-Probe, a novel hallucination detection and mitigation framework leveraging the Variational Information Bottleneck (VIB) theory. Our method extracts discriminative patterns across layers and heads while filtering out semantic nuisances through the information bottleneck principle. Furthermore, by leveraging the gradients of our VIB probe, we identify attention heads with strong causal influence on hallucinations and introduce an inference-time intervention strategy for hallucination mitigation. Extensive experiments across diverse benchmarks demonstrate that VIB-Probe significantly outperforms existing baselines in both settings. Our code will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

hallucination

vision-language models

attention heads

multimodal generation

truthful generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational Information Bottleneck

Hallucination Detection

Vision-Language Models