On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers

📅 2022-03-16

🏛️ Machine-mediated learning

📈 Citations: 3

✨ Influential: 0

🤖 AI Summary

This work addresses a pervasive train-test logical misalignment in classifier calibration evaluation, introducing the “Fit-on-the-Test” perspective. It reveals that standard metrics—such as Expected Calibration Error (ECE)—implicitly refit the calibration mapping on the test set, inducing optimistic bias and compromising reliability. Through theoretical analysis, calibration error decomposition, Monte Carlo simulations, and empirical evaluation across multiple benchmarks (CIFAR-10/100, ImageNet subsets), we systematically demonstrate, for the first time, substantial performance degradation of mainstream calibration methods under this perspective. Building on these findings, we propose a more rigorous calibration evaluation framework and a revised evaluation protocol that explicitly prevents implicit test-set refitting. Our approach enhances assessment reliability, statistical unbiasedness, and cross-method comparability—thereby enabling fairer, more trustworthy calibration evaluation.

Problem

Research questions and friction points this paper is trying to address.

Evaluates classifier calibration using fit-on-the-test view.

Reduces calibration errors with post-hoc calibration methods.

Introduces novel calibration and evaluation techniques.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fit calibration maps test data

Tune ECE bins cross-validation

Introduce novel calibration methods

🔎 Similar Papers

Optimizing Estimators of Squared Calibration Errors in Classification