🤖 AI Summary
Existing uncertainty estimation methods (e.g., Dropout) fail to reliably distinguish high- from low-confidence predictions, primarily due to classifier weight noise interfering with fine-grained confidence calibration.
Method: We propose a test-time resource-aware, noise-aware confidence optimization framework: (i) explicitly modeling how weight noise affects output confidence; (ii) introducing the first monotonic subset selection function, guaranteeing strictly increasing population accuracy when filtering out low-confidence samples; and (iii) systematically characterizing fundamental differences in confidence behavior between CNNs and Vision Transformers under distribution shift.
Contribution/Results: Our method significantly outperforms Dropout and other baselines on standard risk-evaluation metrics (AUSE, AURC), while improving out-of-distribution detection accuracy and in-distribution/out-of-distribution discrimination reliability. It establishes a novel, interpretable, and verifiable paradigm for confidence calibration toward trustworthy AI.
📝 Abstract
Standard uncertainty estimation techniques, such as dropout, often struggle to clearly distinguish reliable predictions from unreliable ones. We attribute this limitation to noisy classifier weights, which, while not impairing overall class-level predictions, render finer-level statistics less informative. To address this, we propose a novel test-time optimization method that accounts for the impact of such noise to produce more reliable confidence estimates. This score defines a monotonic subset-selection function, where population accuracy consistently increases as samples with lower scores are removed, and it demonstrates superior performance in standard risk-based metrics such as AUSE and AURC. Additionally, our method effectively identifies discrepancies between training and test distributions, reliably differentiates in-distribution from out-of-distribution samples, and elucidates key differences between CNN and ViT classifiers across various vision datasets.