🤖 AI Summary
Existing neural network calibration methods predominantly rely on global post-hoc adjustments, ignoring heterogeneous reliability across predictions and failing to clarify the relationship between calibration improvement and uncertainty-aware decision-making.
Method: We propose an uncertainty-aware hierarchical post-processing calibration framework. It introduces instance-level reliability estimation and hierarchically groups samples based on semantic similarity in feature space. A dual-calibration strategy is designed to separately refine confidence distributions for high-confidence correct and incorrect predictions, jointly optimizing calibration accuracy and low-confidence guidance—without model retraining.
Results: Our method consistently enhances both probabilistic output quality and uncertainty quantification capability. On CIFAR-10/100, it significantly reduces high-confidence error rates, achieves lower Expected Calibration Error (ECE) than isotonic regression and focal loss baselines, and improves empirical conformal coverage and uncertainty discrimination performance.
📝 Abstract
Despite extensive research on neural network calibration, existing methods typically apply global transformations that treat all predictions uniformly, overlooking the heterogeneous reliability of individual predictions. Furthermore, the relationship between improved calibration and effective uncertainty-aware decision-making remains largely unexplored. This paper presents a post-hoc calibration framework that leverages prediction reliability assessment to jointly enhance calibration quality and uncertainty-aware decision-making. The framework employs proximity-based conformal prediction to stratify calibration samples into putatively correct and putatively incorrect groups based on semantic similarity in feature space. A dual calibration strategy is then applied: standard isotonic regression calibrated confidence in putatively correct predictions, while underconfidence-regularized isotonic regression reduces confidence toward uniform distributions for putatively incorrect predictions, facilitating their identification for further investigations. A comprehensive evaluation is conducted using calibration metrics, uncertainty-aware performance measures, and empirical conformal coverage. Experiments on CIFAR-10 and CIFAR-100 with BiT and CoAtNet backbones show that the proposed method achieves lower confidently incorrect predictions, and competitive Expected Calibration Error compared with isotonic and focal-loss baselines. This work bridges calibration and uncertainty quantification through instance-level adaptivity, offering a practical post-hoc solution that requires no model retraining while improving both probability alignment and uncertainty-aware decision-making.