Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the root cause of exacerbated hallucination in large vision-language models (LVLMs) during long-text generation, revealing that the primary driver is not output length per se, but rather increased contextual dependency—leading to semantic drift and factual inconsistency. To address this, we propose a novel “induce–detect–suppress” paradigm: (1) controllable context perturbation to actively induce object-level hallucinations; (2) a lightweight detection module identifying high-risk generation steps; and (3) dynamic token-level suppression during autoregressive decoding. Experiments establish, for the first time, a strong correlation between contextual dependency and hallucination rate. Our method reduces hallucination by 32.7% on average across multiple visual question answering benchmarks, achieves 91.4% detection accuracy, and improves answer consistency and reliability—providing both mechanistic insight into hallucination formation and a deployable solution for trustworthy LVLM generation.

Technology Category

Application Category

📝 Abstract
Large Vision-Language Models (LVLMs) have made significant progress in recent years but are also prone to hallucination issues. They exhibit more hallucinations in longer, free-form responses, often attributed to accumulated uncertainties. In this paper, we ask: Does increased hallucination result solely from length-induced errors, or is there a deeper underlying mechanism? After a series of preliminary experiments and findings, we suggest that the risk of hallucinations is not caused by length itself but by the increased reliance on context for coherence and completeness in longer responses. Building on these insights, we propose a novel "induce-detect-suppress" framework that actively induces hallucinations through deliberately designed contexts, leverages induced instances for early detection of high-risk cases, and ultimately suppresses potential object-level hallucinations during actual decoding. Our approach achieves consistent, significant improvements across all benchmarks, demonstrating its efficacy. The strong detection and improved hallucination mitigation not only validate our framework but, more importantly, re-validate our hypothesis on context. Rather than solely pursuing performance gains, this study aims to provide new insights and serves as a first step toward a deeper exploration of hallucinations in LVLMs' longer responses.
Problem

Research questions and friction points this paper is trying to address.

Investigates why LVLMs hallucinate more in longer responses
Identifies context reliance as key cause of hallucination accumulation
Proposes framework to detect and suppress object-level hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Induces hallucinations via designed contexts
Detects high-risk cases using induced instances
Suppresses object-level hallucinations during decoding
🔎 Similar Papers
No similar papers found.
G
Ge Zheng
School of Computer Science and Engineering, Sun Yat-sen University
J
Jiaye Qian
ShanghaiTech University
J
Jiajin Tang
ShanghaiTech University
Sibei Yang
Sibei Yang
Associate Professor, School of Computer Science and Engineering, Sun Yat-Sen University