Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the root causes of unreliable self-reported confidence in large language models (LLMs), specifically DeepSeek R1-32B, and explores calibration mechanisms. Method: We identify that explicit chain-of-thought (CoT) reasoning implicitly induces sampling from the model’s internal generation distribution, rendering self-reported confidence an emergent statistical representation of alternative answers during inference—revealing CoT itself as an intrinsic uncertainty estimation process. Our approach integrates forced long-chain reasoning, semantic entropy estimation, and dual-model confidence reconstruction. Results: The framework substantially improves confidence calibration for factual question answering. An independent reader model reconstructs highly correlated confidence scores (ρ > 0.9) solely from reasoning chains, strongly validating the “reasoning-as-sampling” hypothesis. This work provides both a novel theoretical perspective on trustworthy AI and a practical technical pathway for confidence calibration.

Technology Category

Application Category

📝 Abstract
We study the source of uncertainty in DeepSeek R1-32B by analyzing its self-reported verbal confidence on question answering (QA) tasks. In the default answer-then-confidence setting, the model is regularly over-confident, whereas semantic entropy - obtained by sampling many responses - remains reliable. We hypothesize that this is because of semantic entropy's larger test-time compute, which lets us explore the model's predictive distribution. We show that granting DeepSeek the budget to explore its distribution by forcing a long chain-of-thought before the final answer greatly improves its verbal score effectiveness, even on simple fact-retrieval questions that normally require no reasoning. Furthermore, a separate reader model that sees only the chain can reconstruct very similar confidences, indicating the verbal score might be merely a statistic of the alternatives surfaced during reasoning. Our analysis concludes that reliable uncertainty estimation requires explicit exploration of the generative space, and self-reported confidence is trustworthy only after such exploration.
Problem

Research questions and friction points this paper is trying to address.

Analyzing self-reported confidence in LLMs for QA tasks
Improving verbal confidence accuracy via chain-of-thought reasoning
Exploring generative space for reliable uncertainty estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using semantic entropy for reliable uncertainty estimation
Forcing chain-of-thought to explore predictive distribution
Separate reader model reconstructs confidence from reasoning
🔎 Similar Papers
No similar papers found.
J
Jakub Podolak
Informatics Institute, University of Amsterdam, The Netherlands
Rajeev Verma
Rajeev Verma
PhD Student, University of Amsterdam
Decision theoryStatisticsMachine learning