Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the prevalent issue of visual-induced hallucinations in large vision-language models (LVLMs), where generated outputs contradict the actual image content. To mitigate this without requiring additional training, the authors propose CIPHER, a novel inference-time intervention method. CIPHER leverages a diffusion model to construct a counterfactual image dataset, OHC-25K, and extracts intermediate features from the LVLM. By modeling hallucinatory representations within a low-rank subspace derived from these features, the method precisely suppresses hallucinations through orthogonal projection. Extensive experiments demonstrate that CIPHER significantly reduces hallucination rates across multiple benchmarks while preserving the model’s original task performance, thereby validating its effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract

While large vision-language models (LVLMs) achieve strong performance on multimodal tasks, they frequently generate hallucinations -- unfaithful outputs misaligned with the visual input. To address this issue, we introduce CIPHER (Counterfactual Image Perturbations for Hallucination Extraction and Removal), a training-free method that suppresses vision-induced hallucinations via lightweight feature-level correction. Unlike prior training-free approaches that primarily focus on text-induced hallucinations, CIPHER explicitly targets hallucinations arising from the visual modality. CIPHER operates in two phases. In the offline phase, we construct OHC-25K (Object-Hallucinated Counterfactuals, 25,000 samples), a counterfactual dataset consisting of diffusion-edited images that intentionally contradict the original ground-truth captions. We pair these edited images with the unchanged ground-truth captions and process them through an LVLM to extract hallucination-related representations. Contrasting these representations with those from authentic (image, caption) pairs reveals structured, systematic shifts spanning a low-rank subspace characterizing vision-induced hallucination. In the inference phase, CIPHER suppresses hallucinations by projecting intermediate hidden states away from this subspace. Experiments across multiple benchmarks show that CIPHER significantly reduces hallucination rates while preserving task performance, demonstrating the effectiveness of counterfactual visual perturbations for improving LVLM faithfulness. Code and additional materials are available at https://hamidreza-dastmalchi.github.io/cipher-cvpr2026/.

Problem

Research questions and friction points this paper is trying to address.

hallucination

vision-language models

visual modality

faithfulness

multimodal tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual perturbations

vision-induced hallucination

diffusion-guided editing