Two-Sample Testing with Block-Wise Missingness in Multi-Source Data

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Block-wise missingness—particularly under non-ignorable missingness mechanisms—is prevalent in multi-source data and severely compromises the unbiasedness and statistical power of conventional two-sample tests. To address this, we propose the Block Pattern Enhanced Testing (BPET) framework, the first method to explicitly leverage missingness structure as an informative signal for hypothesis testing, without imputation or sample deletion. Our core innovation is a block-aware edge-count statistic constructed from a rank-similarity graph, enhanced via the BRISE procedure to incorporate missingness patterns into the graph topology. We establish its asymptotic chi-square distribution under the null hypothesis. Extensive simulations and real-data analyses demonstrate that BPET maintains strict Type I error control while substantially improving statistical power, robustly detecting distributional differences under non-ignorable missingness.

Technology Category

Application Category

📝 Abstract
Multi-source and multi-modal datasets are increasingly common in scientific research, yet they often exhibit block-wise missingness, where entire data sources or modalities are systematically absent for subsets of subjects. This structured form of missingness presents significant challenges for statistical analysis, particularly for two-sample hypothesis testing. Standard approaches such as imputation or complete-case analysis can introduce bias or result in substantial information loss, especially when the missingness mechanism is not random. To address this methodological gap, we propose the Block-Pattern Enhanced Test (BPET), a general framework for two-sample testing that directly accounts for block-wise missingness without requiring imputation or deletion of observations. As a concrete instantiation, we develop the Block-wise Rank In Similarity graph Edge-count (BRISE) test, which extends rank-based similarity graph methods to settings with block-wise missing data. Under mild conditions, we establish that the null distribution of BRISE converges to a chi-squared distribution. Simulation studies show that BRISE consistently controls the type I error rate and achieves good statistical power under a wide range of alternatives. Applications to two real-world datasets with block-wise missingness further demonstrate the practical utility of our method in identifying meaningful distributional differences.
Problem

Research questions and friction points this paper is trying to address.

Addresses two-sample testing with block-wise missing data
Overcomes bias from imputation and complete-case analysis
Extends rank-based methods to structured missingness scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Block-Pattern Enhanced Test framework
Rank-based similarity graph extension
Handles block-wise missingness without imputation
🔎 Similar Papers
No similar papers found.