🤖 AI Summary
Neural networks commonly exhibit systematic overconfidence, posing safety risks in critical deployment scenarios. Existing calibration methods face a bias-variance trade-off: global temperature scaling is efficient but suffers from high bias; more expressive methods incur high variance due to noise in high-dimensional logits and scarcity of calibration data. This paper proposes LogitGap-TS—a lightweight, data-efficient, sample-aware calibration method that uses the gap between the top-two logits as a denoised scalar signal to dynamically estimate per-sample temperature parameters. We further introduce the SoftECE loss, enabling robust bias-variance optimization under limited calibration data. Evaluated across multiple models and datasets, LogitGap-TS achieves state-of-the-art calibration performance with significantly fewer parameters than existing approaches. It converges stably using only a minimal number of calibration samples—e.g., as few as 32—demonstrating exceptional data efficiency and practicality for safety-critical applications.
📝 Abstract
Recent advances in deep learning have significantly improved predictive accuracy. However, modern neural networks remain systematically overconfident, posing risks for deployment in safety-critical scenarios. Current post-hoc calibration methods face a fundamental dilemma: global approaches like Temperature Scaling apply uniform adjustments across all samples, introducing high bias despite computational efficiency, while more expressive methods that operate on full logit distributions suffer from high variance due to noisy high-dimensional inputs and insufficient validation data. To address these challenges, we propose Sample Margin-Aware Recalibration of Temperature (SMART), a lightweight, data-efficient recalibration method that precisely scales logits based on the margin between the top two logits -- termed the logit gap. Specifically, the logit gap serves as a denoised, scalar signal directly tied to decision boundary uncertainty, providing a robust indicator that avoids the noise inherent in high-dimensional logit spaces while preserving model prediction invariance. Meanwhile, SMART employs a novel soft-binned Expected Calibration Error (SoftECE) objective that balances model bias and variance through adaptive binning, enabling stable parameter updates even with extremely limited calibration data. Extensive evaluations across diverse datasets and architectures demonstrate that SMART achieves state-of-the-art calibration performance even with substantially fewer parameters compared to existing parametric methods, offering a principled, robust, and highly efficient solution for practical uncertainty quantification in neural network predictions. The source code is available at: https://anonymous.4open.science/r/SMART-8B11.