🤖 AI Summary
Existing distribution closeness tests (DCTs) predominantly rely on total variation distance in one-dimensional discrete spaces, rendering them ill-suited for high-dimensional complex data such as images. Although the Maximum Mean Discrepancy (MMD) generalizes to high dimensions, its standard formulation assigns identical values to distribution pairs with differing RKHS norms, resulting in limited discriminative power. To address this, we propose Norm-Adaptive MMD (NAMMD), which normalizes MMD by the RKHS norm of the mean embeddings, thereby substantially enhancing sensitivity to distributional differences and statistical test power. We theoretically establish that the NAMMD-based DCT achieves strictly higher asymptotic statistical power while maintaining a controllable Type-I error bound. Extensive experiments on synthetic noisy data and real-world image datasets demonstrate that our method significantly outperforms baseline DCT approaches in both detection accuracy and robustness.
📝 Abstract
The distribution closeness testing (DCT) assesses whether the distance between a distribution pair is at least $ε$-far. Existing DCT methods mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., using total variation), which limits their applications to complex data (e.g., images). To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measurement of the distributional discrepancy between two complex distributions, into DCT scenarios. However, we find that MMD's value can be the same for many pairs of distributions that have different norms in the same reproducing kernel Hilbert space (RKHS), making MMD less informative when assessing the closeness levels for multiple distribution pairs. To mitigate the issue, we design a new measurement of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales MMD's value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we finally propose the NAMMD-based DCT to assess the closeness levels of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power compared to MMD-based DCT, with bounded type-I error, which is also validated by extensive experiments on many types of data (e.g., synthetic noise, real images). Furthermore, we also apply the proposed NAMMD for addressing the two-sample testing problem and find NAMMD-based two-sample test has higher test power than the MMD-based two-sample test in both theory and experiments.