🤖 AI Summary
This paper addresses the homogeneity testing problem in meta-analyses under binomial random-effects models—specifically, assessing whether treatment effects across multiple small-sample studies are consistent. We propose an optimal goodness-of-fit test based on the 1-Wasserstein distance. To our knowledge, this is the first work to establish a minimax-optimal testing theory within the binomial random-effects framework. Our method innovatively integrates a plug-in Wasserstein distance estimator with debiased chi-square or Cochran’s Q statistics, unifying homogeneity testing for both settings—with and without a reference effect. The approach substantially improves statistical accuracy of *p*-values and confidence intervals in applications such as drug safety assessment for rare adverse events and county-level political outcome modeling. Crucially, its separation rate achieves the theoretical minimax optimal rate.
📝 Abstract
In modern scientific research, small-scale studies with limited participants are increasingly common. However, interpreting individual outcomes can be challenging, making it standard practice to combine data across studies using random effects to draw broader scientific conclusions. In this work, we introduce an optimal methodology for assessing the goodness of fit between a given reference distribution and the distribution of random effects arising from binomial counts. Using the minimax framework, we characterize the smallest separation between the null and alternative hypotheses, called the critical separation, under the 1-Wasserstein distance that ensures the existence of a valid and powerful test. The optimal test combines a plug-in estimator of the Wasserstein distance with a debiased version of Pearson's chi-squared test. We focus on meta-analyses, where a key question is whether multiple studies agree on a treatment's effectiveness before pooling data. That is, researchers must determine whether treatment effects are homogeneous across studies. We begin by analyzing scenarios with a specified reference effect, such as testing whether all studies show the treatment is effective 80% of the time, and describe how the critical separation depends on the reference effect. We then extend the analysis to homogeneity testing without a reference effect and construct an optimal test by debiasing Cochran's chi-squared test. Finally, we illustrate how our proposed methodologies improve the construction of p-values and confidence intervals, with applications to assessing drug safety in the context of rare adverse outcomes and modeling political outcomes at the county level.