DASH: Detection and Assessment of Systematic Hallucinations of VLMs

📅 2025-03-30

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Vision-language models (VLMs) frequently generate object hallucinations in open-world images, yet existing benchmarks—limited in scale and semantic coverage—fail to expose systematic failure patterns. To address this, we propose DASH-OPT: the first fully automated detection framework grounded in natural image manifold optimization. It integrates cross-modal consistency constraints, semantic clustering, and counterfactual prompt retrieval to enable large-scale discovery and semantic cluster localization of hallucinations. Applied across 380 object categories, DASH-OPT identifies over 19,000 high-confidence hallucination clusters spanning 950,000 real-world images. We validate its cross-model transferability and demonstrate that targeted fine-tuning guided by DASH-OPT reduces hallucination rates in PaliGemma significantly. This work establishes the first framework enabling quantitative measurement, root-cause attribution, and mitigation of hallucinations in open-world VLM deployment.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) are prone to object hallucinations, where they erroneously indicate the presenceof certain objects in an image. Existing benchmarks quantify hallucinations using relatively small, labeled datasets. However, this approach is i) insufficient to assess hallucinations that arise in open-world settings, where VLMs are widely used, and ii) inadequate for detecting systematic errors in VLMs. We propose DASH (Detection and Assessment of Systematic Hallucinations), an automatic, large-scale pipeline designed to identify systematic hallucinations of VLMs on real-world images in an open-world setting. A key component is DASH-OPT for image-based retrieval, where we optimize over the ''natural image manifold'' to generate images that mislead the VLM. The output of DASH consists of clusters of real and semantically similar images for which the VLM hallucinates an object. We apply DASH to PaliGemma and two LLaVA-NeXT models across 380 object classes and, in total, find more than 19k clusters with 950k images. We study the transfer of the identified systematic hallucinations to other VLMs and show that fine-tuning PaliGemma with the model-specific images obtained with DASH mitigates object hallucinations. Code and data are available at https://YanNeu.github.io/DASH.

Problem

Research questions and friction points this paper is trying to address.

Detects systematic object hallucinations in vision-language models

Assesses hallucinations in open-world settings using large-scale data

Mitigates errors by fine-tuning models with identified hallucination clusters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic large-scale pipeline for hallucinations

Image-based retrieval optimizing natural image manifold

Clusters real images causing VLM hallucinations

🔎 Similar Papers

No similar papers found.