Optimizing Calibration by Gaining Aware of Prediction Correctness

📅 2024-04-19
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Modern deep neural networks often exhibit miscalibration—exhibiting overconfidence in incorrect predictions and insufficient confidence discrimination among correct ones. Method: This paper proposes a post-hoc calibration method that explicitly leverages prediction correctness as a supervisory signal. It introduces a novel “prediction correctness awareness” mechanism, which implicitly models correctness via lightweight input transformations (e.g., rotation, grayscale conversion), thereby eliminating reliance on non-calibration-oriented losses like cross-entropy. A transformation-augmented calibration objective is designed to enable end-to-end training on unlabeled validation data. Contribution/Results: Extensive experiments demonstrate state-of-the-art calibration performance both in-distribution and out-of-distribution. The method significantly mitigates overconfidence on erroneous predictions while enhancing discriminability of confidence scores for correct predictions, all without requiring ground-truth labels during calibration.

Technology Category

Application Category

📝 Abstract
Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification (e.g., a test sample is wrongly classified and its softmax score on the ground truth class is 0.4), a calibrator trained by the CE loss often produces high confidence on the wrongly predicted class, which is undesirable. In this paper, we propose a new post-hoc calibration objective derived from the aim of calibration. Intuitively, the proposed objective function asks that the calibrator decrease model confidence on wrongly predicted samples and increase confidence on correctly predicted samples. Because a sample itself has insufficient ability to indicate correctness, we use its transformed versions (e.g., rotated, greyscaled, and color-jittered) during calibrator training. Trained on an in-distribution validation set and tested with isolated, individual test samples, our method achieves competitive calibration performance on both in-distribution and out-of-distribution test sets compared with the state of the art. Further, our analysis points out the difference between our method and commonly used objectives such as CE loss and Mean Square Error (MSE) loss, where the latters sometimes deviates from the calibration aim.
Problem

Research questions and friction points this paper is trying to address.

Improve model calibration accuracy
Address Cross-Entropy loss limitations
Enhance confidence on correct predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

New post-hoc calibration objective
Use transformed sample versions
Improved in- and out-distribution calibration
Yuchi Liu
Yuchi Liu
Tsinghua University
L
Lei Wang
School of Computing, The Australian National University (ANU), Canberra, Australia; Data61, The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australia
Y
Yuli Zou
The Hong Kong Polytechnic University (PolyU), Hong Kong, China
James Zou
James Zou
Stanford University
Machine learningcomputational biologycomputational healthstatisticsbiotech
L
Liang Zheng
School of Computing, The Australian National University (ANU), Canberra, Australia