EAZY: Eliminating Hallucinations in LVLMs by Zeroing out Hallucinatory Image Tokens

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Large Vision-Language Models (LVLMs) suffer from object hallucination—generating objects not present in the input image. This work is the first to localize the root cause of hallucination at the image token level, revealing that only ~1.5% of high-attention image tokens dominate hallucinatory generation. To address this, we propose EAZY: a zero-shot, training-free, and architecture-agnostic intervention method. EAZY automatically identifies hallucination-relevant tokens via attention analysis and unsupervised importance estimation, then applies adaptive zero-masking to them. Evaluated across multiple LVLM architectures and benchmark datasets, EAZY consistently mitigates hallucination without compromising original task performance. It improves unsupervised hallucination detection accuracy by 15%, demonstrating precise, lossless, and generalizable hallucination suppression.

Technology Category

Application Category

📝 Abstract

Despite their remarkable potential, Large Vision-Language Models (LVLMs) still face challenges with object hallucination, a problem where their generated outputs mistakenly incorporate objects that do not actually exist. Although most works focus on addressing this issue within the language-model backbone, our work shifts the focus to the image input source, investigating how specific image tokens contribute to hallucinations. Our analysis reveals a striking finding: a small subset of image tokens with high attention scores are the primary drivers of object hallucination. By removing these hallucinatory image tokens (only 1.5% of all image tokens), the issue can be effectively mitigated. This finding holds consistently across different models and datasets. Building on this insight, we introduce EAZY, a novel, training-free method that automatically identifies and Eliminates hAllucinations by Zeroing out hallucinatorY image tokens. We utilize EAZY for unsupervised object hallucination detection, achieving 15% improvement compared to previous methods. Additionally, EAZY demonstrates remarkable effectiveness in mitigating hallucinations while preserving model utility and seamlessly adapting to various LVLM architectures.

Problem

Research questions and friction points this paper is trying to address.

Addresses object hallucination in Large Vision-Language Models (LVLMs).

Identifies and removes hallucinatory image tokens to reduce errors.

Introduces EAZY, a training-free method for hallucination mitigation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies hallucinatory image tokens automatically

Zeroes out hallucinatory tokens to eliminate hallucinations

Improves hallucination detection by 15% without training

🔎 Similar Papers

DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination