Decision from Suboptimal Classifiers: Excess Risk Pre- and Post-Calibration

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

This work studies the regret—i.e., excess risk—induced by using approximate posterior probabilities in binary classification decisions, aiming to quantify its components and guide post-training strategy selection. Methodologically, it integrates calibration curve analysis, multicalibration, and the maximum expected utility decision framework to derive estimable theoretical bounds. The key contribution is a decomposable regret expression: the sum of calibration error and grouping loss. It further provides, for the first time, analytical forms and tight upper/lower bounds for both calibration-induced regret and post-calibration regret. Theoretically, it shows that when regret stems predominantly from miscalibration, recalibration suffices; otherwise, more sophisticated post-training is required. Empirical evaluation on NLP tasks demonstrates that this decomposition effectively informs strategy choice, and multicalibration serves as a low-cost alternative to fine-tuning while preserving decision-theoretic performance.

Technology Category

Application Category

📝 Abstract

Probabilistic classifiers are central for making informed decisions under uncertainty. Based on the maximum expected utility principle, optimal decision rules can be derived using the posterior class probabilities and misclassification costs. Yet, in practice only learned approximations of the oracle posterior probabilities are available. In this work, we quantify the excess risk (a.k.a. regret) incurred using approximate posterior probabilities in batch binary decision-making. We provide analytical expressions for miscalibration-induced regret ($R^{mathrm{CL}}$), as well as tight and informative upper and lower bounds on the regret of calibrated classifiers ($R^{mathrm{GL}}$). These expressions allow us to identify regimes where recalibration alone addresses most of the regret, and regimes where the regret is dominated by the grouping loss, which calls for post-training beyond recalibration. Crucially, both $R^{mathrm{CL}}$ and $R^{mathrm{GL}}$ can be estimated in practice using a calibration curve and a recent grouping loss estimator. On NLP experiments, we show that these quantities identify when the expected gain of more advanced post-training is worth the operational cost. Finally, we highlight the potential of multicalibration approaches as efficient alternatives to costlier fine-tuning approaches.

Problem

Research questions and friction points this paper is trying to address.

Quantify regret from suboptimal probabilistic classifiers

Analyze miscalibration and grouping loss in decision-making

Evaluate recalibration versus advanced post-training trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantify excess risk from approximate posterior probabilities

Provide analytical expressions for miscalibration-induced regret

Use calibration curve and grouping loss estimator

🔎 Similar Papers

Optimizing Estimators of Squared Calibration Errors in Classification