Fast and Scalable Score-Based Kernel Calibration Tests

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This paper addresses three key challenges in assessing the calibration of probabilistic models: high computational cost, poor scalability, and difficulty in controlling Type-I error. To this end, we propose Kernel-based Conditional Calibration Score Difference (KCCSD), a nonparametric hypothesis test grounded in kernel methods. KCCSD introduces a novel class of score-based kernel functions that enable density-free estimation and integrates Stein discrepancy with the conditional goodness-of-fit testing framework, thereby circumventing explicit expectation approximation. The test statistic is efficiently constructed via a U-statistic, ensuring both computational efficiency and scalability. Theoretically, KCCSD provides finite-sample guarantees on strict Type-I error control under mild regularity conditions. Empirical evaluations on diverse synthetic benchmarks demonstrate that KCCSD significantly outperforms existing methods—achieving superior statistical power, favorable scalability with sample size and dimensionality, and robust Type-I/Type-II error control.

Technology Category

Application Category

📝 Abstract

We introduce the Kernel Calibration Conditional Stein Discrepancy test (KCCSD test), a non-parametric, kernel-based test for assessing the calibration of probabilistic models with well-defined scores. In contrast to previous methods, our test avoids the need for possibly expensive expectation approximations while providing control over its type-I error. We achieve these improvements by using a new family of kernels for score-based probabilities that can be estimated without probability density samples, and by using a conditional goodness-of-fit criterion for the KCCSD test's U-statistic. We demonstrate the properties of our test on various synthetic settings.

Problem

Research questions and friction points this paper is trying to address.

Assessing calibration of probabilistic models with scores

Avoiding expensive expectation approximations in calibration tests

Providing type-I error control for kernel-based tests

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernel-based test for probabilistic model calibration

Avoids expensive expectation approximations

Uses score-based kernels without density samples

🔎 Similar Papers

Optimizing Estimators of Squared Calibration Errors in Classification