Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large Vision-Language Models (LVLMs) suffer from severe hallucination, significantly undermining their reliability and practical deployment. To address this, we propose SEED, a self-evolving distillation framework that achieves *intrinsic*, tool-free hallucination purification: it decomposes and recombines internal knowledge to identify and isolate hallucinatory components; introduces a Hallucination-Elimination Adapter to rectify “dark knowledge”; and employs mode-optimized distillation to prevent output-space voids. This end-to-end approach substantially improves reliability—on LLaVA-1.5 and InternVL2, it boosts F1 score on POPE-Random from 81.3 to 88.3 for LLaVA-1.5. Our core contribution is the first lightweight, knowledge-driven paradigm for hallucination mitigation, grounded in *self-purification* of internal representations rather than external supervision or post-hoc filtering.

Technology Category

Application Category

📝 Abstract
Large Vision-Language Models (LVLMs) have demonstrated remarkable advancements in numerous areas such as multimedia. However, hallucination issues significantly limit their credibility and application potential. Existing mitigation methods typically rely on external tools or the comparison of multi-round inference, which significantly increase inference time. In this paper, we propose extbf{SE}lf- extbf{E}volving extbf{D}istillation ( extbf{SEED}), which identifies hallucinations within the inner knowledge of LVLMs, isolates and purges them, and then distills the purified knowledge back into the model, enabling self-evolution. Furthermore, we identified that traditional distillation methods are prone to inducing void spaces in the output space of LVLMs. To address this issue, we propose a Mode-Seeking Evolving approach, which performs distillation to capture the dominant modes of the purified knowledge distribution, thereby avoiding the chaotic results that could emerge from void spaces. Moreover, we introduce a Hallucination Elimination Adapter, which corrects the dark knowledge of the original model by learning purified knowledge. Extensive experiments on multiple benchmarks validate the superiority of our SEED, demonstrating substantial improvements in mitigating hallucinations for representative LVLM models such as LLaVA-1.5 and InternVL2. Remarkably, the F1 score of LLaVA-1.5 on the hallucination evaluation metric POPE-Random improved from 81.3 to 88.3.
Problem

Research questions and friction points this paper is trying to address.

Mitigating hallucinations in Large Vision-Language Models (LVLMs)
Reducing reliance on external tools for hallucination mitigation
Avoiding void spaces in LVLM output during distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Evolving Distillation (SEED) for LVLMs
Mode-Seeking Evolving avoids chaotic output spaces
Hallucination Elimination Adapter corrects dark knowledge
🔎 Similar Papers
No similar papers found.
W
Wenhao Li
University of Sydney
X
Xiu Su
Central South University
J
Jingyi Wu
Fudan University
F
Feng Yang
Southeast University
Y
Yang Liu
Fudan University
Y
Yi Chen
HKUST
Shan You
Shan You
SenseTime Research
deep learningmultimodal LLMedge AI
C
Chang Xu
University of Sydney