Composite Goodness-of-fit Tests with Kernels

📅 2021-11-19
🏛️ arXiv.org
📈 Citations: 14
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the goodness-of-fit testing problem under composite hypotheses: “Does a model belong to a given parametric family?” We propose a unified, data-splitting-free, kernel-based testing framework. For the first time, it enables parameter estimation and hypothesis testing to share the same dataset while rigorously controlling the Type-I error rate. The method accommodates unnormalized densities and simulator-based models without requiring explicit density evaluation. By integrating maximum mean discrepancy (MMD), kernel Stein discrepancy (KSD), and minimum distance estimation—within a theoretically grounded composite null testing framework—it substantially improves statistical power and broadens applicability. Experiments on unnormalized density models and biological cell-network simulators demonstrate its effectiveness. Theoretically, the test level is precisely controllable, and the framework extends the scope of goodness-of-fit testing to complex, intractable models.
📝 Abstract
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. In this paper, we propose one such method. More precisely, we propose kernel-based hypothesis tests for the challenging composite testing problem, where we are interested in whether the data comes from any distribution in some parametric family. Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy. They are widely applicable, including whenever the density of the parametric model is known up to normalisation constant, or if the model takes the form of a simulator. As our main result, we show that we are able to estimate the parameter and conduct our test on the same data (without data splitting), while maintaining a correct test level. Our approach is illustrated on a range of problems, including testing for goodness-of-fit of an unnormalised non-parametric density model, and an intractable generative model of a biological cellular network.
Problem

Research questions and friction points this paper is trying to address.

Tests model misspecification in parametric families
Uses kernel-based methods without data splitting
Applies to unnormalized and intractable generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernel-based hypothesis tests for composite testing
Minimum distance estimators using MMD and KSD
Single-data parameter estimation and testing
🔎 Similar Papers
No similar papers found.
Oscar Key
Oscar Key
University College London
machine learningscalable algorithmsml systems
T
T. Fernandez
Faculty of Engineering and Science, Adolfo Ibañez University
A
A. Gretton
Gatsby Computational Neuroscience Unit, University College London
F
F. Briol
Department of Statistical Science, University College London