🤖 AI Summary
This work addresses the accurate estimation of second-order calibration error in binary classification, which quantifies the alignment between a predictor’s epistemic uncertainty and the conditional label variance over its level sets. The authors propose a novel approach based on a sech-perturbation kernel combined with polynomial regression, introducing—for the first time—a bucket-free definition of second-order calibration and providing finite-sample theoretical guarantees for second-order Platt scaling. By leveraging tools from analytic function theory and minimax analysis, they establish that the estimation error achieves a rate of Õ(1/√n) and prove a matching information-theoretic lower bound of Ω(1/√n). Empirical evaluations confirm both the predicted convergence rate and a substantial improvement in the quality of uncertainty estimates after recalibration.
📝 Abstract
We characterize the minimax rate of estimating the second-order calibration error for binary classification, which quantifies whether a higher-order predictor's epistemic-uncertainty estimate matches the conditional variance of the label probability on its level sets. Our key observation is that the sech perturbation kernel, previously used only to enforce smoothness of calibration functions, in fact makes them analytic in a strip of half-width $hπ/2$. Polynomial regression then estimates the calibration error at rate $\tilde{O}(1/\sqrt{n})$, with explicit constants, a qualitative improvement over the $O(n^{-1/4})$ rate achievable by bucketing or kernel smoothing. A matching $Ω(1/\sqrt{n})$ lower bound establishes minimax optimality up to logarithmic factors. As a corollary, we give the first finite-sample guarantee for second-order Platt scaling, yielding a post-hoc procedure that recalibrates both the mean prediction and the epistemic-variance estimate of any higher-order predictor. Along the way, we provide a bucket-free definition of second-order calibration and relate it quantitatively to the bucketed formulation of Ahdritz et al. [2025]. Our experiments confirm the predicted rate and the quality of the recalibrated uncertainties.