π€ AI Summary
Large Vision-Language Models (LVLMs) suffer from severe hallucination, and existing Contrastive Decoding (CD) methods employ static strategies ill-suited to the heterogeneity of hallucination causes and generation steps. This paper proposes Dynamic Contrastive Decoding (DCD), a novel framework comprising three core components: (1) a step-wise hallucination-type discriminator that identifies hallucination categories at each decoding step; (2) an adaptive mechanism that selects input perturbations and contrastive signals based on the heterogeneous root causes of hallucinations; and (3) a lightweight adapter for efficient integration. DCD breaks away from the conventional βone-size-fits-allβ CD paradigm, enabling fine-grained, step-level hallucination suppression. Evaluated on four mainstream multimodal hallucination benchmarks, DCD achieves significant improvements over state-of-the-art methods. It is model-agnostic, highly deployable, and strongly extensible.
π Abstract
Large Vision-Language Models (LVLMs) have obtained impressive performance in visual content understanding and multi-modal reasoning. Unfortunately, these large models suffer from serious hallucination problems and tend to generate fabricated responses. Recently, several Contrastive Decoding (CD) strategies have been proposed to alleviate hallucination by introducing disturbed inputs. Although great progress has been made, these CD strategies mostly apply a one-size-fits-all approach for all input conditions. In this paper, we revisit this process through extensive experiments. Related results show that hallucination causes are hybrid and each generative step faces a unique hallucination challenge. Leveraging these meaningful insights, we introduce a simple yet effective Octopus-like framework that enables the model to adaptively identify hallucination types and create a dynamic CD workflow. Our Octopus framework not only outperforms existing methods across four benchmarks but also demonstrates excellent deployability and expansibility. Code is available at https://github.com/LijunZhang01/Octopus.