🤖 AI Summary
This paper addresses three key challenges in assessing the calibration of probabilistic models: high computational cost, poor scalability, and difficulty in controlling Type-I error. To this end, we propose Kernel-based Conditional Calibration Score Difference (KCCSD), a nonparametric hypothesis test grounded in kernel methods. KCCSD introduces a novel class of score-based kernel functions that enable density-free estimation and integrates Stein discrepancy with the conditional goodness-of-fit testing framework, thereby circumventing explicit expectation approximation. The test statistic is efficiently constructed via a U-statistic, ensuring both computational efficiency and scalability. Theoretically, KCCSD provides finite-sample guarantees on strict Type-I error control under mild regularity conditions. Empirical evaluations on diverse synthetic benchmarks demonstrate that KCCSD significantly outperforms existing methods—achieving superior statistical power, favorable scalability with sample size and dimensionality, and robust Type-I/Type-II error control.
📝 Abstract
We introduce the Kernel Calibration Conditional Stein Discrepancy test (KCCSD test), a non-parametric, kernel-based test for assessing the calibration of probabilistic models with well-defined scores. In contrast to previous methods, our test avoids the need for possibly expensive expectation approximations while providing control over its type-I error. We achieve these improvements by using a new family of kernels for score-based probabilities that can be estimated without probability density samples, and by using a conditional goodness-of-fit criterion for the KCCSD test's U-statistic. We demonstrate the properties of our test on various synthetic settings.