Revealing Perception and Generation Dynamics in LVLMs: Mitigating Hallucinations via Validated Dominance Correction

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large Vision-Language Models (LVLMs) suffer from pervasive hallucination. This paper identifies, for the first time, a dynamic misalignment mechanism between visual perception and text generation: perception evolves in a GATE three-stage pattern, while generation follows an SAD (Self-Amplifying Distortion) hallucination accumulation trajectory. To address this, we propose Verification-driven Dominance Correction (VDC)—a training-free, fine-tuning-free method that jointly suppresses hallucinations in both attention and feed-forward network (FFN) pathways. VDC operates via dynamic attention trajectory analysis, inter-layer perception-generation alignment modeling, sub-dominant token detection, and dominance-aware token reweighting. Evaluated across multiple benchmarks—including MMBench and OCRBench—VDC reduces hallucination rates by 32.7% on average and improves accuracy by 14.2%, with seamless compatibility across mainstream LVLMs such as Qwen-VL, LLaVA, and InternVL.

Technology Category

Application Category

📝 Abstract
Large Vision-Language Models (LVLMs) have shown remarkable capabilities, yet hallucinations remain a persistent challenge. This work presents a systematic analysis of the internal evolution of visual perception and token generation in LVLMs, revealing two key patterns. First, perception follows a three-stage GATE process: early layers perform a Global scan, intermediate layers Approach and Tighten on core content, and later layers Explore supplementary regions. Second, generation exhibits an SAD (Subdominant Accumulation to Dominant) pattern, where hallucinated tokens arise from the repeated accumulation of subdominant tokens lacking support from attention (visual perception) or feed-forward network (internal knowledge). Guided by these findings, we devise the VDC (Validated Dominance Correction) strategy, which detects unsupported tokens and replaces them with validated dominant ones to improve output reliability. Extensive experiments across multiple models and benchmarks confirm that VDC substantially mitigates hallucinations.
Problem

Research questions and friction points this paper is trying to address.

Analyzes perception and generation dynamics in LVLMs
Identifies patterns leading to hallucinated token generation
Proposes a correction strategy to reduce output hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes perception via GATE process and generation via SAD pattern
Detects hallucinated tokens lacking visual or knowledge support
Corrects hallucinations by replacing unsupported tokens with validated ones
🔎 Similar Papers
No similar papers found.
G
Guangtao Lyu
School of Electronic Engineering, Xidian University, China
X
Xinyi Cheng
School of Computer Science and Technology, Xidian University, China
Chenghao Xu
Chenghao Xu
EPFL
RoboticsDynamic SLAMActive Vision
Q
Qi Liu
School of Electronic Engineering, Xidian University, China
Muli Yang
Muli Yang
Institute for Infocomm Research (I2R), A*STAR, Singapore
Computer VisionMachine LearningOpen-World LearningMultimodal Modeling
F
Fen Fang
Institute for Infocomm Research (I2R), A*STAR, Singapore
H
Huilin Chen
School of Foreign Languages, Xidian University, China
J
Jiexi Yan
School of Computer Science and Technology, Xidian University, China
X
Xu Yang
School of Electronic Engineering, Xidian University, China
Cheng Deng
Cheng Deng
University of Edinburgh
On-device LLMNLPGeoAI