Mitigating Hallucinations in Large Vision-Language Models via Causal Route Gating

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large vision-language models often generate hallucinated content inconsistent with input images due to the dominance of textual pathways. This work proposes a training-free intervention that, for the first time, uncovers the competitive dynamics between visual and textual pathways within attention heads. By leveraging a single forward pass combined with gradient approximation, the method enables fine-grained estimation of the causal influence of each pathway on token generation. Building on this insight, an attention routing gating strategy is introduced to selectively suppress textual pathways while preserving visual grounding. Evaluated across five discriminative and generative benchmarks, the approach substantially reduces hallucination rates with minimal impact on overall multimodal performance and incurs only modest inference overhead.
📝 Abstract
Large vision-language models (LVLMs) often hallucinate content that is fluent yet unsupported by the image, limiting their reliability in real-world deployment. We show that a key failure mode arises from route competition: even when visual tokens receive attention, the final token decision can be dominated by the textual pathway, causing the decoder to follow linguistic priors over visual evidence. To mitigate this, we propose a training-free, decision-aligned intervention that decomposes each attention head into a visual route and a text route, and estimates their token-level effects using an efficient one-forward/one-gradient approximation. These estimates reveal route conflict within heads and identify prior-dominant ones, enabling selective suppression of only the text route while keeping the visual route intact. Across five benchmarks spanning discriminative and generative settings, our method consistently reduces hallucination-related errors across models with limited impact on overall multimodal performance, while incurring a modest inference-time overhead.
Problem

Research questions and friction points this paper is trying to address.

hallucination
large vision-language models
route competition
visual evidence
linguistic priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

causal route gating
hallucination mitigation
vision-language models
attention decomposition
training-free intervention
Z
Zhe Cheng
Center of Statistical Research, School of Statistics and Data Science, Southwestern University of Finance and Economics, Chengdu, China
Wenyu Chen
Wenyu Chen
Massachusetts Institute of Technology
optimizationstatistical learning
F
Fode Zhang
Center of Statistical Research, School of Statistics and Data Science, Southwestern University of Finance and Economics, Chengdu, China
D
Dehuan Shen
Department of Biomedical Engineering, College of Design and Engineering, National University of Singapore, Singapore