Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Multimodal large language models (MLLMs) frequently exhibit hallucinations at the object, attribute, and relational levels in vision-language tasks; existing mitigation strategies rely on fine-tuning, handcrafted priors, or compromise scalability. To address this, we propose a training-free, self-supervised dual-anchor representation rectification method. It leverages text-to-image generative models to construct a positive anchor (the original image) and a negative anchor (a hallucination-inducing image), then performs contrastive directional push-pull editing within the decoder’s hidden state space to automatically disentangle factual from hallucinated semantic directions. The approach is plug-and-play, architecture-agnostic, and requires no parameter updates. Evaluated on standard hallucination benchmarks (e.g., CHAIR), it substantially reduces hallucination rates—exceeding a 5% absolute reduction for LLaVA-v1.5-7B—while preserving recall and descriptive richness. It demonstrates strong generalization across diverse MLLMs and robustness to input perturbations.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) have achieved remarkable success across diverse vision-language tasks, yet they remain highly susceptible to hallucinations, producing content that is fluent but inconsistent with visual evidence. Such hallucinations, spanning objects, attributes, and relations, persist even in larger models, while existing mitigation approaches often require additional finetuning, handcrafted priors, or trade-offs that compromise informativeness and scalability. To address this limitation, we propose a training-free, self-supervised method for hallucination mitigation. Our approach introduces a novel hallucination amplification mechanism: a caption is projected into the visual space via a text-to-image model to reveal implicit hallucination signals, serving as a negative anchor, while the original image provides a positive anchor. Leveraging these dual anchors, we edit decoder hidden states by pulling representations toward faithful semantics and pushing them away from hallucination directions. This correction requires no human priors or additional training costs, ensuring both effectiveness and efficiency. Extensive experiments across multiple benchmarks show that our method significantly reduces hallucinations at the object, attribute, and relation levels while largely preserving recall and caption richness, e.g., achieving a hallucination reduction by over 5% using LLaVA-v1.5-7B on CHAIR. Furthermore, results on diverse architectures, including LLaVA-NEXT-7B, Cambrian-8B, and InstructBLIP-7B, validate strong cross-architecture generalization. More importantly, when applied to hallucination-free captions, our method introduces almost no side effects, underscoring its robustness and practical plug-and-play applicability. The implementation will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Mitigating hallucinations in multimodal large language models

Correcting object attribute relation inconsistencies without training

Enhancing visual-language alignment through representation editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free self-supervised method for hallucination mitigation

Hallucination amplification via text-to-image projection anchors

Editing decoder hidden states using dual semantic anchors

🔎 Similar Papers

No similar papers found.

Authors to Follow