Adversarial Orthogonal Disentanglement for LVLM Hallucination Mitigation

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the reliability issues of large vision-language models (LVLMs) caused by hallucinations—generations inconsistent with visual input. The authors propose the Adversarial Orthogonal Decoupling (AOD) framework, which geometrically disentangles hallucinatory features in latent space for the first time. Through minimax training, hallucination signals are concentrated into a projected component, while a gradient reversal layer removes them from the orthogonal residual subspace. Additionally, AOD introduces a training-free dual-forward contrastive decoding strategy that suppresses hallucinations during inference. Notably, the method requires no fine-tuning or external intervention and generalizes across diverse datasets. Experiments demonstrate that AOD improves POPE accuracy and AMBER scores by over 6% on average across three prominent LVLMs, while maintaining strong performance on general multimodal benchmarks such as MMMU, thereby validating its effectiveness and broad applicability.
📝 Abstract
Large Vision-Language Models (LVLMs) have advanced multimodal understanding, yet their reliability is limited by hallucination, where generated content conflicts with visual facts. Existing mitigation methods either rely on costly external interventions, such as instruction tuning and retrieval, or use internal mechanisms that remain limited by flawed attention weights and entangled hidden representations. We propose Adversarial Orthogonal Disentanglement (AOD), a latent geometric framework for mitigating LVLM hallucinations. AOD learns a hallucination-related direction through a minimax objective: a classifier concentrates hallucination signals into the projected component, while an adversary removes them from the orthogonal residual space via a Gradient Reversal Layer. The learned direction enables a training-free dual-forward-pass contrastive decoding strategy that suppresses hallucinations while preserving general capabilities. Experiments on three LVLMs across four hallucination and four utility benchmarks show that AOD consistently outperforms strong baselines. It improves POPE accuracy by over 6\% on average, boosts AMBER by 6\%, and maintains strong performance on utility tasks such as MMMU. Further analysis shows robust transfer across datasets, suggesting that AOD captures general hallucination-related biases rather than dataset-specific artifacts. Our source code and datasets are available at https://github.com/Hunter-Wrynn/AOD.
Problem

Research questions and friction points this paper is trying to address.

hallucination
Large Vision-Language Models
multimodal understanding
reliability
visual facts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Orthogonal Disentanglement
hallucination mitigation
latent geometric framework
contrastive decoding
Gradient Reversal Layer
🔎 Similar Papers
2024-10-06Conference on Empirical Methods in Natural Language ProcessingCitations: 33