Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently exhibit miscalibration—overconfidence or underconfidence—in factual question answering, undermining their reliable deployment. To address this, we propose an unsupervised reinforcement learning (RL) framework for confidence calibration: we formalize calibration as a betting game and design a theoretically grounded dual-penalty reward function that jointly penalizes over- and under-confidence based on optimal calibration theory. Using the PPO algorithm, we jointly optimize answer generation and confidence score prediction. Crucially, our method is the first to explicitly embed theoretical optimality conditions for calibration into the RL reward, requiring no human-annotated confidence labels, enabling model-intrinsic calibration, and supporting zero-shot cross-task generalization. Experiments across multiple benchmarks demonstrate a 42% average reduction in Expected Calibration Error (ECE); notably, strong calibration persists even on unseen tasks, confirming the trainability of intrinsic confidence awareness in LLMs.

Technology Category

Application Category

📝 Abstract
A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers. We introduce a novel Reinforcement Learning (RL) approach for LLM calibration that fine-tunes LLMs to elicit calibrated confidence estimations in their answers to factual questions. We model the problem as a betting game where the model predicts a confidence score together with every answer, and design a reward function that penalizes both over and under-confidence. We prove that under our reward design an optimal policy would result in a perfectly calibrated confidence estimation. Our experiments demonstrate significantly improved confidence calibration and generalization to new tasks without re-training, indicating that our approach teaches a general confidence awareness. This approach enables the training of inherently calibrated LLMs.
Problem

Research questions and friction points this paper is trying to address.

Improves confidence calibration in Large Language Models.
Uses Reinforcement Learning to fine-tune confidence estimations.
Ensures accurate confidence scores for factual question answers.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning for LLM calibration
Betting game model for confidence estimation
Reward function penalizes over and under-confidence
🔎 Similar Papers
No similar papers found.
P
Paul Stangel
School of Computation, Information and Technology, Technical University of Munich; Munich Center for Machine Learning
David Bani-Harouni
David Bani-Harouni
Technical University of Munich
Chantal Pellegrini
Chantal Pellegrini
Technical University of Munich
Deep LearningComputer VisionMedical ImagingNatural Language Processing
E
Ege Ozsoy
School of Computation, Information and Technology, Technical University of Munich; Munich Center for Machine Learning
K
Kamilia Zaripova
School of Computation, Information and Technology, Technical University of Munich; Munich Center for Machine Learning
Matthias Keicher
Matthias Keicher
Technische Universität München
N
N. Navab
School of Computation, Information and Technology, Technical University of Munich; Munich Center for Machine Learning