Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding

πŸ“… 2025-03-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large Vision-Language Models (LVLMs) suffer from severe hallucination, and existing Contrastive Decoding (CD) methods employ static strategies ill-suited to the heterogeneity of hallucination causes and generation steps. This paper proposes Dynamic Contrastive Decoding (DCD), a novel framework comprising three core components: (1) a step-wise hallucination-type discriminator that identifies hallucination categories at each decoding step; (2) an adaptive mechanism that selects input perturbations and contrastive signals based on the heterogeneous root causes of hallucinations; and (3) a lightweight adapter for efficient integration. DCD breaks away from the conventional β€œone-size-fits-all” CD paradigm, enabling fine-grained, step-level hallucination suppression. Evaluated on four mainstream multimodal hallucination benchmarks, DCD achieves significant improvements over state-of-the-art methods. It is model-agnostic, highly deployable, and strongly extensible.

Technology Category

Application Category

πŸ“ Abstract
Large Vision-Language Models (LVLMs) have obtained impressive performance in visual content understanding and multi-modal reasoning. Unfortunately, these large models suffer from serious hallucination problems and tend to generate fabricated responses. Recently, several Contrastive Decoding (CD) strategies have been proposed to alleviate hallucination by introducing disturbed inputs. Although great progress has been made, these CD strategies mostly apply a one-size-fits-all approach for all input conditions. In this paper, we revisit this process through extensive experiments. Related results show that hallucination causes are hybrid and each generative step faces a unique hallucination challenge. Leveraging these meaningful insights, we introduce a simple yet effective Octopus-like framework that enables the model to adaptively identify hallucination types and create a dynamic CD workflow. Our Octopus framework not only outperforms existing methods across four benchmarks but also demonstrates excellent deployability and expansibility. Code is available at https://github.com/LijunZhang01/Octopus.
Problem

Research questions and friction points this paper is trying to address.

Addresses hallucination in Large Vision-Language Models (LVLMs).
Introduces dynamic Contrastive Decoding to adapt to hallucination types.
Improves model performance across multiple benchmarks effectively.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Contrastive Decoding for hallucination reduction
Adaptive identification of hallucination types
Octopus framework enhances deployability and expansibility
πŸ”Ž Similar Papers
No similar papers found.
W
Wei Suo
School of Computer Science and Ningbo Institute, Northwestern Polytechnical University, China.
L
Lijun Zhang
School of Computer Science and Ningbo Institute, Northwestern Polytechnical University, China.
Mengyang Sun
Mengyang Sun
Northwestern Polytechnical University
computer vision、 vision-language interaction
Lin Yuanbo Wu
Lin Yuanbo Wu
Swansea University
Computer VisionAI GenerationTrustworthy AIAutonomous SystemEmbodied Visual Intelligence
P
Peng Wang
School of Computer Science and Ningbo Institute, Northwestern Polytechnical University, China.
Yanning Zhang
Yanning Zhang
Northwestern Polytechnical University
Computer Vision