TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-language models (VLMs) are prone to hallucination in image captioning due to over-reliance on linguistic priors, compromising reliability in high-stakes applications. To address this, we propose a parameter-free, training-free temporal logits consistency modeling framework. Our method introduces Temporal Prediction Connection (TPC), the first mechanism to explicitly enforce semantic continuity across autoregressive decoding steps by linking logits temporally. It integrates temperature scaling with cross-step consistency regularization to stabilize generation without architectural modification. Evaluated across multiple benchmarks, our approach reduces hallucination rates by 23.6% on average, while improving caption accuracy and textual coherence. Crucially, it incurs no computational overhead—preserving inference speed—and maintains strong open-domain robustness. The method requires neither additional parameters nor fine-tuning, offering a lightweight, plug-and-play solution for hallucination mitigation in VLM-based captioning.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) have achieved remarkable advancements, capitalizing on the impressive capabilities of large language models (LLMs) across diverse tasks. Despite this, a critical challenge known as hallucination occurs when models overconfidently describe objects or attributes absent from the image, a problem exacerbated by the tendency of VLMs to rely on linguistic priors. This limitation reduces model reliability in high-stakes applications. In this work, we have observed the characteristic of logits' continuity consistency enhancement and introduced a straightforward and efficient method, Cross-Temporal Prediction Connection (TPC), designed to enhance the semantic consistency of logits by connecting them temporally across timesteps. TPC amplifies information flow and improves coherence, effectively reducing hallucination. Extensive experiments show that TPC surpasses existing representatives, delivering superior performance in both accuracy and efficiency while maintaining robustness in open-ended text generation tasks.
Problem

Research questions and friction points this paper is trying to address.

Reduces hallucination in vision-language models
Enhances semantic consistency across timesteps
Improves reliability in high-stakes applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Temporal Prediction Connection (TPC) introduced
Enhances semantic consistency of logits temporally
Reduces hallucination in vision-language models effectively
🔎 Similar Papers
No similar papers found.
C
Chao Wang
Shanghai University
Weiwei Fu
Weiwei Fu
Fudan University
data assimilationinverse modelbiogeochemical cycles
Y
Yang Zhou
Shanghai University