Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

📅 2024-05-24

🏛️ Neural Information Processing Systems

📈 Citations: 9

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Large Vision-Language Models (LVLMs) commonly suffer from hallucinations—textual outputs inconsistent with the input image. Existing contrastive decoding methods, which rely on global visual uncertainty estimation, fail to precisely localize and suppress hallucinated tokens and may even introduce new hallucinations. To address this, we propose Hallucination-Induced Optimization (HIO), a theory-driven framework featuring: (1) a novel fine-grained hallucination token identification mechanism; and (2) Contrary Bradley–Terry preference modeling coupled with multi-stage logits reweighting to enable targeted contrastive reinforcement between hallucinated and grounded tokens. Unlike prior global uncertainty approaches, HIO operates at the token level, enabling precise hallucination mitigation. Extensive experiments demonstrate that HIO significantly reduces hallucination rates across multiple benchmarks while improving both output faithfulness and cross-modal alignment accuracy, consistently outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropriately widens the contrastive logits gap between hallucinatory and targeted ones. However, due to uncontrollable nature of the global visual uncertainty, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations and may even lead to the generation of undesired hallucinations. To tackle this issue, we conducted the theoretical analysis to promote the effectiveness of contrast decoding. Building on this insight, we introduce a novel optimization strategy named Hallucination-Induced Optimization (HIO). This strategy seeks to amplify the contrast between hallucinatory and targeted tokens relying on a fine-tuned theoretical preference model (i.e., Contrary Bradley-Terry Model), thereby facilitating efficient contrast decoding to alleviate hallucinations in LVLMs. Extensive experimental research demonstrates that our HIO strategy can effectively reduce hallucinations in LVLMs, outperforming state-of-the-art methods across various benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Mitigating hallucinations in Large Vision-Language Models (LVLMs)

Improving contrast decoding for precise hallucination reduction

Enhancing text-image alignment through Hallucination-Induced Optimization (HIO)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hallucination-Induced Optimization (HIO) strategy

Fine-tuned Contrary Bradley-Terry Model

Amplifies contrast between hallucinatory and targeted tokens

🔎 Similar Papers

From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models