Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

📅 2025-03-01

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Large Vision-Language Models (LVLMs) suffer from severe hallucination, and existing Contrastive Decoding (CD) methods employ static strategies ill-suited to the heterogeneity of hallucination causes and generation steps. This paper proposes Dynamic Contrastive Decoding (DCD), a novel framework comprising three core components: (1) a step-wise hallucination-type discriminator that identifies hallucination categories at each decoding step; (2) an adaptive mechanism that selects input perturbations and contrastive signals based on the heterogeneous root causes of hallucinations; and (3) a lightweight adapter for efficient integration. DCD breaks away from the conventional “one-size-fits-all” CD paradigm, enabling fine-grained, step-level hallucination suppression. Evaluated on four mainstream multimodal hallucination benchmarks, DCD achieves significant improvements over state-of-the-art methods. It is model-agnostic, highly deployable, and strongly extensible.

Technology Category

Application Category

📝 Abstract

Large Vision-Language Models (LVLMs) have obtained impressive performance in visual content understanding and multi-modal reasoning. Unfortunately, these large models suffer from serious hallucination problems and tend to generate fabricated responses. Recently, several Contrastive Decoding (CD) strategies have been proposed to alleviate hallucination by introducing disturbed inputs. Although great progress has been made, these CD strategies mostly apply a one-size-fits-all approach for all input conditions. In this paper, we revisit this process through extensive experiments. Related results show that hallucination causes are hybrid and each generative step faces a unique hallucination challenge. Leveraging these meaningful insights, we introduce a simple yet effective Octopus-like framework that enables the model to adaptively identify hallucination types and create a dynamic CD workflow. Our Octopus framework not only outperforms existing methods across four benchmarks but also demonstrates excellent deployability and expansibility. Code is available at https://github.com/LijunZhang01/Octopus.

Problem

Research questions and friction points this paper is trying to address.

Addresses hallucination in Large Vision-Language Models (LVLMs).

Introduces dynamic Contrastive Decoding to adapt to hallucination types.

Improves model performance across multiple benchmarks effectively.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Contrastive Decoding for hallucination reduction

Adaptive identification of hallucination types

Octopus framework enhances deployability and expansibility

🔎 Similar Papers

Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization