How do LLMs Compute Verbal Confidence

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This study investigates how large language models (LLMs) generate verbalized confidence—specifically, whether such confidence is computed on-the-fly or automatically cached during response generation, and whether it relies on token log-probabilities or richer representations of answer quality. Applying a suite of interpretability techniques—including activation steering, attention ablation, and representation probing—to Gemma 3 27B and Qwen 2.5 7B, the work reveals for the first time that verbalized confidence stems from a complex self-evaluation representation automatically cached during answer generation, rather than being reconstructed post-hoc or derived from simple fluency signals. This representation is cached immediately after the response and demonstrates significantly greater explanatory power for verbalized confidence than token log-probabilities, offering a novel perspective on LLM metacognitive mechanisms.

Technology Category

Application Category

📝 Abstract

Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed - just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents - token log-probabilities, or a richer evaluation of answer quality? Focusing on Gemma 3 27B and Qwen 2.5 7B, we provide convergent evidence for cached retrieval. Activation steering, patching, noising, and swap experiments reveal that confidence representations emerge at answer-adjacent positions before appearing at the verbalization site. Attention blocking pinpoints the information flow: confidence is gathered from answer tokens, cached at the first post-answer position, then retrieved for output. Critically, linear probing and variance partitioning reveal that these cached representations explain substantial variance in verbal confidence beyond token log-probabilities, suggesting a richer answer-quality evaluation rather than a simple fluency readout. These findings demonstrate that verbal confidence reflects automatic, sophisticated self-evaluation -- not post-hoc reconstruction -- with implications for understanding metacognition in LLMs and improving calibration.

Problem

Research questions and friction points this paper is trying to address.

verbal confidence

large language models

uncertainty estimation

self-evaluation

metacognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

verbal confidence

cached retrieval

activation steering