🤖 AI Summary
Existing goodness-of-fit tests often fail under sampling based on sufficient statistics when the null model lacks a closed-form sufficient statistic, leading to limited applicability, reduced statistical power, or computational inefficiency. To address this, we propose aCSS-B: an asymptotically valid frequentist conditional test framework that leverages Bayesian posterior samples as approximate sufficient statistics. By integrating MCMC-based posterior sampling with conditional data generation, aCSS-B enables model fit assessment without requiring explicit sufficient statistics. We establish its asymptotic validity under mild regularity conditions and demonstrate high power and robustness across challenging settings—including generalized linear models and latent variable models—where no existing goodness-of-fit test is applicable. Our key contribution is the first systematic use of posterior samples to construct exchangeable conditional distributions, thereby substantially extending the scope of exchangeable data generation frameworks to models lacking analytic sufficient statistics.
📝 Abstract
Tests of goodness of fit are used in nearly every domain where statistics is applied. One powerful and flexible approach is to sample artificial data sets that are exchangeable with the real data under the null hypothesis (but not under the alternative), as this allows the analyst to conduct a valid test using any test statistic they desire. Such sampling is typically done by conditioning on either an exact or approximate sufficient statistic, but existing methods for doing so have significant limitations, which either preclude their use or substantially reduce their power or computational tractability for many important models. In this paper, we propose to condition on samples from a Bayesian posterior distribution, which constitute a very different type of approximate sufficient statistic than those considered in prior work. Our approach, approximately co-sufficient sampling via Bayes (aCSS-B), considerably expands the scope of this flexible type of goodness-of-fit testing. We prove the approximate validity of the resulting test, and demonstrate its utility on three common null models where no existing methods apply, as well as its outperformance on models where existing methods do apply.