🤖 AI Summary
This paper addresses the computational intractability of Calibrated Decision Loss (CDL) under offline decision-making settings. We propose a computable approximation metric, $mathsf{CDL}_K$, defined over a restricted post-processing function family $K$. Methodologically, we impose structural constraints on the post-processing space and leverage properties of proper losses together with information-theoretic analysis to establish a theoretical framework under black-box predictor access. Our main contributions are threefold: (i) the first systematic characterization of the information-theoretic nature and computational tractability conditions of $mathsf{CDL}_K$; (ii) derivation of tight upper and lower bounds, proving that $mathsf{CDL}_K$ is polynomial-time computable for natural function classes—including piecewise-constant, monotonic, and neural network families; and (iii) the first rigorous theoretical justification for classical recalibration methods such as temperature scaling and equal-frequency binning.
📝 Abstract
A decision-theoretic characterization of perfect calibration is that an agent seeking to minimize a proper loss in expectation cannot improve their outcome by post-processing a perfectly calibrated predictor. Hu and Wu (FOCS'24) use this to define an approximate calibration measure called calibration decision loss ($mathsf{CDL}$), which measures the maximal improvement achievable by any post-processing over any proper loss. Unfortunately, $mathsf{CDL}$ turns out to be intractable to even weakly approximate in the offline setting, given black-box access to the predictions and labels.
We suggest circumventing this by restricting attention to structured families of post-processing functions $K$. We define the calibration decision loss relative to $K$, denoted $mathsf{CDL}_K$ where we consider all proper losses but restrict post-processings to a structured family $K$. We develop a comprehensive theory of when $mathsf{CDL}_K$ is information-theoretically and computationally tractable, and use it to prove both upper and lower bounds for natural classes $K$. In addition to introducing new definitions and algorithmic techniques to the theory of calibration for decision making, our results give rigorous guarantees for some widely used recalibration procedures in machine learning.