🤖 AI Summary
This study addresses a key challenge in semi-supervised two-sample testing: how to effectively leverage abundant unlabeled covariates to enhance test power while preserving exchangeability under the null hypothesis. The authors propose a novel semi-supervised framework that, for the first time in two-sample testing, integrates covariate information while ensuring asymptotic normality of the test statistic, thereby enabling consistent testing against both fixed and local alternatives. By combining kernel methods, semi-supervised learning, and asymptotic analysis, the method achieves straightforward calibration and is theoretically shown to attain higher asymptotic power. Extensive simulations demonstrate that the proposed approach significantly outperforms existing kernel-based two-sample tests that ignore covariate information, both in theory and in practice.
📝 Abstract
We consider the problem of two-sample testing in a semi-supervised setting with abundant unlabeled covariate data. Standard two-sample tests neglect covariate information, which has the potential to significantly boost performance. However, incorporating covariates potentially breaks the exchangeability assumption under the null, which further complicates a calibration procedure. To address these issues, we propose a semi-supervised method that produces a test statistic with asymptotic normality, while effectively integrating additional information from covariates. Our test is straightforward to calibrate due to the asymptotic normality under the null and achieves asymptotic power that is often much higher than existing kernel tests without covariates. Furthermore, we formally show that the proposed method is consistent in power against fixed and local alternatives. Simulations confirm the practical and theoretical strengths of our approach.