Balancing Classification and Calibration Performance in Decision-Making LLMs via Calibration Aware Reinforcement Learning

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models often exhibit overconfidence in decision-making tasks, leading to unreliable confidence estimates that undermine trust in their outputs within downstream systems. To address the limitation of conventional reinforcement learning approaches—where decision tokens lack explicit confidence information—this work proposes a calibration-aware reinforcement learning method that, for the first time, explicitly integrates calibration objectives into the reinforcement learning loss function. This approach directly optimizes the probability distribution of decision tokens to jointly enhance both accuracy and confidence reliability. Experimental results demonstrate that the proposed method maintains high accuracy comparable to standard RLVR while significantly mitigating overconfidence, achieving up to a 9-point reduction in Expected Calibration Error (ECE).

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly deployed in decision-making tasks, where not only accuracy but also reliable confidence estimates are essential. Well-calibrated confidence enables downstream systems to decide when to trust a model and when to defer to fallback mechanisms. In this work, we conduct a systematic study of calibration in two widely used fine-tuning paradigms: supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). We show that while RLVR improves task performance, it produces extremely overconfident models, whereas SFT yields substantially better calibration, even under distribution shift, though with smaller performance gains. Through targeted experiments, we diagnose RLVR's failure, showing that decision tokens act as extraction steps of the decision in reasoning traces and do not carry confidence information, which prevents reinforcement learning from surfacing calibrated alternatives. Based on this insight, we propose a calibration-aware reinforcement learning formulation that directly adjusts decision-token probabilities. Our method preserves RLVR's accuracy level while mitigating overconfidence, reducing ECE scores up to 9 points.
Problem

Research questions and friction points this paper is trying to address.

calibration
large language models
decision-making
overconfidence
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibration-Aware Reinforcement Learning
Decision-Making LLMs
Overconfidence Mitigation
Decision Token Calibration
ECE Reduction
🔎 Similar Papers
No similar papers found.