On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers

📅 2022-03-16
🏛️ Machine-mediated learning
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a pervasive train-test logical misalignment in classifier calibration evaluation, introducing the “Fit-on-the-Test” perspective. It reveals that standard metrics—such as Expected Calibration Error (ECE)—implicitly refit the calibration mapping on the test set, inducing optimistic bias and compromising reliability. Through theoretical analysis, calibration error decomposition, Monte Carlo simulations, and empirical evaluation across multiple benchmarks (CIFAR-10/100, ImageNet subsets), we systematically demonstrate, for the first time, substantial performance degradation of mainstream calibration methods under this perspective. Building on these findings, we propose a more rigorous calibration evaluation framework and a revised evaluation protocol that explicitly prevents implicit test-set refitting. Our approach enhances assessment reliability, statistical unbiasedness, and cross-method comparability—thereby enabling fairer, more trustworthy calibration evaluation.
Problem

Research questions and friction points this paper is trying to address.

Evaluates classifier calibration using fit-on-the-test view.
Reduces calibration errors with post-hoc calibration methods.
Introduces novel calibration and evaluation techniques.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fit calibration maps test data
Tune ECE bins cross-validation
Introduce novel calibration methods
🔎 Similar Papers
No similar papers found.
M
Markus Kängsepp
Institute of Computer Science, University of Tartu, Narva mnt, Tartu, 51009, Tartumaa, Estonia.
K
Kaspar Valk
Institute of Computer Science, University of Tartu, Narva mnt, Tartu, 51009, Tartumaa, Estonia.
Meelis Kull
Meelis Kull
Professor of Artificial Intelligence, University of Tartu
Machine learningClassifier calibrationUncertainty quantificationData science#unitartucs