Optimizing Estimators of Squared Calibration Errors in Classification

📅 2024-10-09

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing calibration error estimation lacks differentiable, optimizable estimators, hindering end-to-end calibration optimization. Method: We formulate the squared calibration error estimation as a regression task over i.i.d. sample pairs, adopting mean-squared error (MSE) as the risk criterion. Leveraging the bilinear structure of the squared calibration error, we employ kernel ridge regression with joint hyperparameter optimization within a novel train-validation-test estimation pipeline. Contribution/Results: This work establishes the first unified risk-based framework for calibration error estimation; reformulates canonical calibration error estimation as a learnable, differentiable regression problem; and introduces a principled three-stage estimation protocol. Evaluated on standard image classification benchmarks, our estimator achieves significantly higher accuracy than state-of-the-art methods. It is the first practical, end-to-end optimizable estimator for canonical calibration error, enabling gradient-based calibration refinement.

Technology Category

Application Category

📝 Abstract

In this work, we propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors in practical settings. Improving the calibration of classifiers is crucial for enhancing the trustworthiness and interpretability of machine learning models, especially in sensitive decision-making scenarios. Although various calibration (error) estimators exist in the current literature, there is a lack of guidance on selecting the appropriate estimator and tuning its hyperparameters. By leveraging the bilinear structure of squared calibration errors, we reformulate calibration estimation as a regression problem with independent and identically distributed (i.i.d.) input pairs. This reformulation allows us to quantify the performance of different estimators even for the most challenging calibration criterion, known as canonical calibration. Our approach advocates for a training-validation-testing pipeline when estimating a calibration error on an evaluation dataset. We demonstrate the effectiveness of our pipeline by optimizing existing calibration estimators and comparing them with novel kernel ridge regression-based estimators on standard image classification tasks.

Problem

Research questions and friction points this paper is trying to address.

Optimizing squared calibration error estimators

Improving classifier calibration trustworthiness

Selecting and tuning calibration estimators

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mean-squared error risk

Bilinear structure reformulation

Kernel ridge regression estimators

🔎 Similar Papers

Calibration in Deep Learning: A Survey of the State-of-the-Art