🤖 AI Summary
This work addresses the challenge of reliably estimating the total variation (TV) distance in two-sample testing without any distributional assumptions. The authors propose the blurred total variation (blurred TV) distance as a computationally tractable relaxation of TV, and for the first time establish rigorous upper and lower bounds for this quantity in a fully distribution-free setting. They further demonstrate its favorable properties in high-dimensional regimes. Building upon this new metric, they develop a nonparametric two-sample testing framework that integrates measure smoothing with minimax analysis, yielding a distribution-free test that achieves both theoretical guarantees and statistical efficiency for high-dimensional data.
📝 Abstract
Two-sample testing, where we aim to determine whether two distributions are equal or not equal based on samples from each one, is challenging if we cannot place assumptions on the properties of the two distributions. In particular, certifying equality of distributions, or even providing a tight upper bound on the total variation (TV) distance between the distributions, is impossible to achieve in a distribution-free regime. In this work, we examine the blurred TV distance, a relaxation of TV distance that enables us to perform inference without assumptions on the distributions. We provide theoretical guarantees for distribution-free upper and lower bounds on the blurred TV distance, and examine its properties in high dimensions.