The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the impact of chain-of-thought (CoT) reasoning on uncertainty quantification (UQ) in vision-language models. While CoT enhances task accuracy, it induces overconfidence by implicitly conditioning answers, thereby undermining the reliability of UQ methods. This study is the first to uncover this mechanism and systematically evaluates a range of UQ approaches under CoT reasoning. Experimental results demonstrate that mainstream UQ techniques suffer significant performance degradation when applied within CoT frameworks. In contrast, consistency-based UQ methods not only maintain robustness but also exhibit improved reliability as the reasoning chain lengthens. These findings offer a viable pathway toward trustworthy vision-language reasoning in high-stakes applications where calibrated uncertainty estimates are critical.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) are increasingly deployed in high-stakes settings where reliable uncertainty quantification (UQ) is as important as predictive accuracy. Extended reasoning via chain-of-thought (CoT) prompting or reasoning-trained models has become ubiquitous in modern VLM pipelines, yet its effect on UQ reliability remains poorly understood. We show that reasoning consistently degrades the quality of most uncertainty estimates, even when it improves task accuracy. We identify implicit answer conditioning as the primary mechanism: as reasoning traces converge on a conclusion before the final answer is generated, token probabilities increasingly reflect consistency with the model's own reasoning trace rather than uncertainty about correctness. In effect, the model becomes overconfident in its answer. In contrast, agreement-based consistency remains robust and often improves under reasoning, making it a practical choice for uncertainty estimation in reasoning-enabled VLMs.
Problem

Research questions and friction points this paper is trying to address.

uncertainty quantification
chain-of-thought
vision-language models
overconfidence
reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

chain-of-thought
uncertainty quantification
vision-language models
overconfidence
implicit answer conditioning
🔎 Similar Papers
No similar papers found.
R
Robert Welch
KTH Royal Institute of Technology, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden
E
Emir Konuk
KTH Royal Institute of Technology, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden
Kevin Smith
Kevin Smith
Professor, KTH Royal Institute of Technology & Science for Life Laboratory
Computer VisionMachine LearningBiomedical Image AnalysisMedical Image Analysis