🤖 AI Summary
This paper addresses the finite-sample estimation of L₁ calibration error for binary classifiers. Existing approaches rely either on strong distributional assumptions or asymptotic analysis, limiting their practical applicability. To overcome these limitations, we propose a distribution-free, non-asymptotic theoretical framework. First, we derive a tight upper bound on the L₁ calibration error for calibration functions of bounded variation—a novel result in calibration theory. Second, we design a general-purpose, post-hoc correction method that controls calibration error without requiring model retraining or compromising original predictive performance. Our approach integrates bounded-variation function analysis, nonparametric calibration modeling, and distribution-free probabilistic inequalities. Experiments across multiple benchmark datasets demonstrate that the framework enables highly reliable calibration assessment with low computational overhead, significantly improving both accuracy and robustness of L₁ calibration error estimation—particularly in small-sample regimes.
📝 Abstract
We make two contributions to the problem of estimating the $L_1$ calibration error of a binary classifier from a finite dataset. First, we provide an upper bound for any classifier where the calibration function has bounded variation. Second, we provide a method of modifying any classifier so that its calibration error can be upper bounded efficiently without significantly impacting classifier performance and without any restrictive assumptions. All our results are non-asymptotic and distribution-free. We conclude by providing advice on how to measure calibration error in practice. Our methods yield practical procedures that can be run on real-world datasets with modest overhead.