π€ AI Summary
This work addresses the sharp decline in power experienced by conventional methods in high-dimensional two-sample homogeneity testing. To overcome this limitation, the authors propose a novel test based on the projected 1-Wasserstein distance. The method jointly learns an optimal low-dimensional projection direction and a witness function using a deep neural network, incorporates an asymptotically pivotal calibration mechanism that avoids resampling, and employs a max-type aggregation strategy to adaptively handle unknown projection dimensions and potential sparsity structures. Theoretical analysis establishes the validity and consistency of the proposed test. Extensive experiments on both simulated and real-world data demonstrate its superior finite-sample performance, significantly outperforming existing approaches.
π Abstract
The two-sample homogeneity testing problem is fundamental in statistics and becomes particularly challenging in high dimensions, where classical tests can suffer substantial power loss. We develop a learning-assisted procedure based on the projection 1-Wasserstein distance, which we call the neural Wasserstein test. The method is motivated by the observation that there often exists a low-dimensional projection under which the two high-dimensional distributions differ. In practice, we learn the projection directions via manifold optimization and a witness function using deep neural networks. To adapt to unknown projection dimensions and sparsity levels, we aggregate a collection of candidate statistics through a max-type construction, avoiding explicit tuning while potentially improving power. We establish the validity and consistency of the proposed test and prove a Berry--Esseen type bound for the Gaussian approximation. In particular, under the null hypothesis, the aggregated statistic converges to the absolute maximum of a standard Gaussian vector, yielding an asymptotically pivotal (distribution-free) calibration that bypasses resampling. Simulation studies and a real-data example demonstrate the strong finite-sample performance of the proposed method.