Improving Metacognition and Uncertainty Communication in Language Models

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large language models (LLMs) often implicitly generate low-confidence outputs during decision-making, increasing the risk of erroneous judgments; moreover, their explicit confidence expressions suffer from poor calibration, weak discrimination, and limited cross-task generalizability. Method: We propose a multi-task supervised fine-tuning framework that jointly trains models on commonsense reasoning, mathematical problem-solving, open-domain question answering, and domain-specific (medical/legal) tasks to perform both single-question confidence estimation and pairwise confidence comparison. Contribution/Results: We provide the first systematic empirical validation that uncertainty calibration and discriminative capability do not automatically transfer across domains and must be co-optimized via multi-task training for robust generalization. Experiments demonstrate significant improvements: reduced calibration error (↓ECE), enhanced confidence discrimination (↑AUC), and preserved task accuracy—establishing a scalable, empirically grounded pathway toward improving LLM metacognitive reliability.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used in decision-making contexts, but when they present answers without signaling low confidence, users may unknowingly act on erroneous outputs. While prior work shows that LLMs maintain internal uncertainty signals, their explicit verbalized confidence is typically miscalibrated and poorly discriminates between correct and incorrect answers. Across two types of LLMs, we investigate whether supervised finetuning can improve models' ability to communicate uncertainty and whether such improvements generalize across tasks and domains. We finetune the LLMs on datasets spanning general knowledge, mathematics, and open-ended trivia, and evaluate two metacognitive tasks: (1) single-question confidence estimation, where the model assigns a numeric certainty to its answer, and (2) pairwise confidence comparison, where the model selects which of two answers it is more likely to have correct. We assess generalization to unseen domains, including medical and legal reasoning. Results show that finetuning improves calibration (alignment between stated confidence and accuracy) and discrimination (higher confidence for correct vs. incorrect responses) within and across domains, while leaving accuracy unchanged. However, improvements are task-specific: training on single-question calibration does not transfer to pairwise comparison, and vice versa. In contrast, multitask finetuning on both forms of metacognition yields broader gains, producing lower calibration error and stronger discrimination in out-of-domain evaluations. These results show that while uncertainty communication in LLMs is trainable and generalizable, different metacognitive skills do not naturally reinforce one another and must be developed together through multitask training.

Problem

Research questions and friction points this paper is trying to address.

LLMs present answers without signaling low confidence to users

Verbalized confidence in LLMs is miscalibrated and poorly discriminating

Uncertainty communication improvements don't transfer across metacognitive tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised finetuning improves uncertainty communication calibration

Multitask training enhances generalization across different domains

Task-specific metacognitive skills require combined training approach

🔎 Similar Papers

No similar papers found.