🤖 AI Summary
Integrating hypothesis testing results across heterogeneous multi-source studies—some reporting only binary significance decisions, others only FDR control levels—poses a fundamental challenge for rigorous, unified FDR control. Method: We propose the Integrated Ranking and Thresholding (IRT) framework, which operates solely on binary rejection decisions, a prespecified global FDR level, and the set of hypotheses—requiring neither raw data, p-values, nor effect sizes. IRT employs nonparametric evidence aggregation and a ranking-driven thresholding mechanism, circumventing traditional meta-analysis assumptions of statistical homogeneity and reliance on shared summary statistics. Contribution/Results: IRT is the first method to achieve theoretically guaranteed strong FDR control under non-shared statistical summaries. We prove its FDR control property rigorously; simulations demonstrate superior performance over state-of-the-art integration methods; and real-world application to multi-center genome-wide association studies confirms its practical utility and robustness.
📝 Abstract
Learning from the collective wisdom of crowds is related to the statistical notion of fusion learning from multiple data sources or studies. However, fusing inferences from diverse sources is challenging since cross-source heterogeneity and potential data-sharing complicate statistical inference. Moreover, studies may rely on disparate designs, employ myriad modeling techniques, and prevailing data privacy norms may forbid sharing even summary statistics across the studies for an overall analysis. We propose an Integrative Ranking and Thresholding (IRT) framework for fusion learning in multiple testing. IRT operates under the setting where from each study a triplet is available: the vector of binary accept-reject decisions on the tested hypotheses, its False Discovery Rate (FDR) level and the hypotheses tested by it. Under this setting, IRT constructs an aggregated and nonparametric measure of evidence against each null hypotheses, which facilitates ranking the hypotheses in the order of their likelihood of being rejected. We show that IRT guarantees an overall FDR control if the studies control their respective FDR at the desired levels. IRT is extremely flexible, and a comprehensive numerical study demonstrates its practical relevance for pooling inferences. A real data illustration and extensions to alternative forms of Type I error control are discussed.