HTDC: Hesitation-Triggered Differential Calibration for Mitigating Hallucination in Large Vision-Language Models

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the hallucination problem in large vision-language models, which often stems from unstable visual grounding and overreliance on linguistic priors. Existing training-free decoding methods mitigate hallucinations by calibrating at every generation step, but this introduces redundant computation and disrupts stable prediction. To overcome these limitations, the authors propose a “hesitation-triggered” mechanism that monitors preference fluctuations of intermediate-layer tokens to identify high-risk decoding steps. Only when such instability is detected does the method deploy lightweight visual and semantic ablation probes for differential calibration, while preserving standard autoregressive inference in all other steps. Evaluated across multiple hallucination benchmarks, this approach significantly reduces hallucination rates without compromising task accuracy, achieving an effective balance between reliability and computational efficiency.

Technology Category

Application Category

📝 Abstract

Large vision-language models (LVLMs) achieve strong multimodal performance, but still suffer from hallucinations caused by unstable visual grounding and over-reliance on language priors. Existing training-free decoding methods typically apply calibration at every decoding step, introducing unnecessary computation and potentially disrupting stable predictions. We address this problem by identifying layer-wise hesitation, a simple signal of grounding instability reflected by fluctuations in token preference across intermediate layers. Based on this observation, we propose Hesitation-Triggered Differential Calibration (HTDC), a training-free decoding framework that preserves standard full-branch inference and activates calibration only at hesitation-prone steps. When triggered, HTDC contrasts the full branch with two lightweight probes, a visual-nullification probe and a semantic-nullification probe, to suppress hallucination-prone candidates while avoiding unnecessary intervention on stable steps. Experiments on representative hallucination benchmarks show that HTDC consistently reduces hallucinations while maintaining strong task accuracy, achieving a favorable trade-off between effectiveness and computational overhead.

Problem

Research questions and friction points this paper is trying to address.

hallucination

large vision-language models

visual grounding

language priors

decoding calibration

Innovation

Methods, ideas, or system contributions that make the work stand out.

hesitation-triggered calibration

hallucination mitigation

vision-language models