TRUST: Test-time Resource Utilization for Superior Trustworthiness

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Existing uncertainty estimation methods (e.g., Dropout) fail to reliably distinguish high- from low-confidence predictions, primarily due to classifier weight noise interfering with fine-grained confidence calibration. Method: We propose a test-time resource-aware, noise-aware confidence optimization framework: (i) explicitly modeling how weight noise affects output confidence; (ii) introducing the first monotonic subset selection function, guaranteeing strictly increasing population accuracy when filtering out low-confidence samples; and (iii) systematically characterizing fundamental differences in confidence behavior between CNNs and Vision Transformers under distribution shift. Contribution/Results: Our method significantly outperforms Dropout and other baselines on standard risk-evaluation metrics (AUSE, AURC), while improving out-of-distribution detection accuracy and in-distribution/out-of-distribution discrimination reliability. It establishes a novel, interpretable, and verifiable paradigm for confidence calibration toward trustworthy AI.

Technology Category

Application Category

📝 Abstract

Standard uncertainty estimation techniques, such as dropout, often struggle to clearly distinguish reliable predictions from unreliable ones. We attribute this limitation to noisy classifier weights, which, while not impairing overall class-level predictions, render finer-level statistics less informative. To address this, we propose a novel test-time optimization method that accounts for the impact of such noise to produce more reliable confidence estimates. This score defines a monotonic subset-selection function, where population accuracy consistently increases as samples with lower scores are removed, and it demonstrates superior performance in standard risk-based metrics such as AUSE and AURC. Additionally, our method effectively identifies discrepancies between training and test distributions, reliably differentiates in-distribution from out-of-distribution samples, and elucidates key differences between CNN and ViT classifiers across various vision datasets.

Problem

Research questions and friction points this paper is trying to address.

Improves reliability of uncertainty estimation in predictions

Addresses noisy classifier weights affecting confidence estimates

Differentiates in-distribution and out-of-distribution samples effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time optimization for reliable confidence estimates

Monotonic subset-selection function improving accuracy

Identifies distribution discrepancies and sample differences

🔎 Similar Papers

2024-08-24arXiv.orgCitations: 0