๐ค AI Summary
This work addresses the unreliability of predictive confidence in deep neural networks, a problem often exacerbated by existing calibration methods that compromise model refinementโthe ability to produce distinct, well-separated predictions. To overcome this trade-off, the authors propose RefCal, a unified training framework that jointly optimizes accuracy, calibration, and refinement in an end-to-end manner. RefCal integrates supervised contrastive learning with a novel refinement-oriented loss function, circumventing the limitations of conventional post-hoc approaches that merely approximate uncertainty. Evaluated on CIFAR-100-LT, RefCal achieves 58.81% accuracy, 95.67% refinement, and a remarkably low expected calibration error (ECE) of 0.08, substantially outperforming baseline methods such as Correctness Ranking Loss and significantly enhancing the reliability of model decisions.
๐ Abstract
Although deep neural networks (DNNs) achieve high predictive accuracy, their confidence estimates are often unreliable, potentially compromising user trust in their decisions. This has motivated research on calibrated models, where calibration measures how well a model's predicted confidence aligns with the empirical probability of correctness. However, calibration metrics can often be improved through post-processing techniques that merely mimic training-time uncertainty without genuinely improving the model's understanding. For this reason, statisticians recommend that models be not only calibrated but also refined. Intuitively, a model is considered more refined if it assigns significantly different confidence scores to correct and incorrect predictions, a property also referred to as sharpness. We observe that many existing calibration methods improve calibration at the cost of reduced refinement. To address this limitation, we propose: (1) a novel loss function that explicitly promotes refinement and can be optimized through supervised contrastive learning; and (2) a unified training framework, RefCal, that jointly optimizes calibration, refinement, and accuracy to improve DNN reliability. On the CIFAR-100-LT dataset with 10 percent class imbalance, RefCal achieves (accuracy, refinement, ECE) of (58.81, 95.67, 0.08), substantially outperforming the widely used Correctness Ranking Loss, which achieves (46.27, 93.7, 0.22).