Neural Wasserstein Two-Sample Tests

πŸ“… 2026-01-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the sharp decline in power experienced by conventional methods in high-dimensional two-sample homogeneity testing. To overcome this limitation, the authors propose a novel test based on the projected 1-Wasserstein distance. The method jointly learns an optimal low-dimensional projection direction and a witness function using a deep neural network, incorporates an asymptotically pivotal calibration mechanism that avoids resampling, and employs a max-type aggregation strategy to adaptively handle unknown projection dimensions and potential sparsity structures. Theoretical analysis establishes the validity and consistency of the proposed test. Extensive experiments on both simulated and real-world data demonstrate its superior finite-sample performance, significantly outperforming existing approaches.

Technology Category

Application Category

πŸ“ Abstract
The two-sample homogeneity testing problem is fundamental in statistics and becomes particularly challenging in high dimensions, where classical tests can suffer substantial power loss. We develop a learning-assisted procedure based on the projection 1-Wasserstein distance, which we call the neural Wasserstein test. The method is motivated by the observation that there often exists a low-dimensional projection under which the two high-dimensional distributions differ. In practice, we learn the projection directions via manifold optimization and a witness function using deep neural networks. To adapt to unknown projection dimensions and sparsity levels, we aggregate a collection of candidate statistics through a max-type construction, avoiding explicit tuning while potentially improving power. We establish the validity and consistency of the proposed test and prove a Berry--Esseen type bound for the Gaussian approximation. In particular, under the null hypothesis, the aggregated statistic converges to the absolute maximum of a standard Gaussian vector, yielding an asymptotically pivotal (distribution-free) calibration that bypasses resampling. Simulation studies and a real-data example demonstrate the strong finite-sample performance of the proposed method.
Problem

Research questions and friction points this paper is trying to address.

two-sample test
high-dimensional statistics
homogeneity testing
Wasserstein distance
statistical power
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Wasserstein test
projection Wasserstein distance
manifold optimization
max-type aggregation
asymptotically pivotal calibration
πŸ”Ž Similar Papers
No similar papers found.
X
Xiaoyu Hu
Department of Statistics and Data Science, Xi’an Jiaotong University
Zhenhua Lin
Zhenhua Lin
National University of Singapore
Functional data analysisHigh-dimensional data analysisnon-Euclidean data analysis