Optimizing Calibration by Gaining Aware of Prediction Correctness

📅 2024-04-19

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Modern deep neural networks often exhibit miscalibration—exhibiting overconfidence in incorrect predictions and insufficient confidence discrimination among correct ones. Method: This paper proposes a post-hoc calibration method that explicitly leverages prediction correctness as a supervisory signal. It introduces a novel “prediction correctness awareness” mechanism, which implicitly models correctness via lightweight input transformations (e.g., rotation, grayscale conversion), thereby eliminating reliance on non-calibration-oriented losses like cross-entropy. A transformation-augmented calibration objective is designed to enable end-to-end training on unlabeled validation data. Contribution/Results: Extensive experiments demonstrate state-of-the-art calibration performance both in-distribution and out-of-distribution. The method significantly mitigates overconfidence on erroneous predictions while enhancing discriminability of confidence scores for correct predictions, all without requiring ground-truth labels during calibration.

Technology Category

Application Category

📝 Abstract

Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification (e.g., a test sample is wrongly classified and its softmax score on the ground truth class is 0.4), a calibrator trained by the CE loss often produces high confidence on the wrongly predicted class, which is undesirable. In this paper, we propose a new post-hoc calibration objective derived from the aim of calibration. Intuitively, the proposed objective function asks that the calibrator decrease model confidence on wrongly predicted samples and increase confidence on correctly predicted samples. Because a sample itself has insufficient ability to indicate correctness, we use its transformed versions (e.g., rotated, greyscaled, and color-jittered) during calibrator training. Trained on an in-distribution validation set and tested with isolated, individual test samples, our method achieves competitive calibration performance on both in-distribution and out-of-distribution test sets compared with the state of the art. Further, our analysis points out the difference between our method and commonly used objectives such as CE loss and Mean Square Error (MSE) loss, where the latters sometimes deviates from the calibration aim.

Problem

Research questions and friction points this paper is trying to address.

Improve model calibration accuracy

Address Cross-Entropy loss limitations

Enhance confidence on correct predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

New post-hoc calibration objective

Use transformed sample versions

Improved in- and out-distribution calibration

🔎 Similar Papers

Calibration in Deep Learning: A Survey of the State-of-the-Art