🤖 AI Summary
Traditional partial conjunction (PC) tests suffer from inflated Type I error rates when multiple studies exhibit sample overlap or dependence—e.g., shared individuals in genome-wide association studies (GWAS). To address this, we propose e-Filter, a novel method that achieves rigorous family-wise error rate (FWER) or false discovery rate (FDR) control for partial conjunction hypotheses under *unknown* inter-study dependence structures. Grounded in e-value theory, e-Filter adopts a two-stage “filter-then-select” framework that obviates explicit modeling of dependencies while ensuring robust assessment of cross-study replicability of genetic signals. Extensive simulations and real GWAS analyses demonstrate that e-Filter substantially improves statistical power: in pathway enrichment analysis for LDL cholesterol–associated loci, it outperforms existing methods. By enabling valid inference without dependence specification, e-Filter establishes a new paradigm for reproducibility assessment in dependent multi-omics studies.
📝 Abstract
Replicability is central to scientific progress, and the partial conjunction (PC) hypothesis testing framework provides an objective tool to quantify it across disciplines. Existing PC methods assume independent studies. Yet many modern applications, such as genome-wide association studies (GWAS) with sample overlap, violate this assumption, leading to dependence among study-specific summary statistics. Failure to account for this dependence can drastically inflate type I errors when combining inferences. We propose e-Filter, a powerful procedure grounded on the theory of e-values. It involves a filtering step that retains a set of the most promising PC hypotheses, and a selection step where PC hypotheses from the filtering step are marked as discoveries whenever their e-values exceed a selection threshold. We establish the validity of e-Filter for FWER and FDR control under unknown study dependence. A comprehensive simulation study demonstrates its excellent power gains over competing methods. We apply e-Filter to a GWAS replicability study to identify consistent genetic signals for low-density lipoprotein cholesterol (LDL-C). Here, the participating studies exhibit varying levels of sample overlap, rendering existing methods unsuitable for combining inferences. A subsequent pathway enrichment analysis shows that e-Filter replicated signals achieve stronger statistical enrichment on biologically relevant LDL-C pathways than competing approaches.