Integral Probability Metrics Meet Neural Networks: The Radon-Kolmogorov-Smirnov Test

📅 2023-09-05

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Two-sample testing under high-dimensional, multi-order smooth distributions remains challenging due to the curse of dimensionality and lack of interpretable, statistically optimal test statistics. Method: This paper proposes the Radon–Kolmogorov–Smirnov (RKS) test, the first framework rigorously linking integral probability metrics (IPMs) to the Radon bounded-variation function space and neural networks. It establishes that the optimal discriminator is necessarily a $k$-th-order ridge spline—equivalent to a single-neuron network—unifying statistical interpretability with deep learning optimization. Leveraging Radon transform theory and bounded-variation analysis, RKS derives the asymptotic null distribution of its test statistic and proves asymptotic minimax optimality (i.e., asymptotically full power). Results: Experiments demonstrate that RKS significantly outperforms classical kernel-based methods—including kernel Maximum Mean Discrepancy (MMD)—in high-dimensional settings, while retaining theoretical guarantees and computational tractability.

📝 Abstract

Integral probability metrics (IPMs) constitute a general class of nonparametric two-sample tests that are based on maximizing the mean difference between samples from one distribution $P$ versus another $Q$, over all choices of data transformations $f$ living in some function space $mathcal{F}$. Inspired by recent work that connects what are known as functions of $ extit{Radon bounded variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the IPM defined by taking $mathcal{F}$ to be the unit ball in the RBV space of a given smoothness degree $k geq 0$. This test, which we refer to as the $ extit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. It is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. We can thus leverage the power of modern neural network optimization toolkits to (approximately) maximize the criterion that underlies the RKS test. We prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P ot= Q$ of distributions, derive its asymptotic null distribution, and carry out experiments to elucidate the strengths and weaknesses of the RKS test versus the more traditional kernel MMD test.

Problem

Research questions and friction points this paper is trying to address.

Generalizes Kolmogorov-Smirnov test to multidimensional smooth distributions

Connects neural networks with Radon bounded variation function spaces

Develops nonparametric two-sample test using neural network optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging neural networks for Radon-Kolmogorov-Smirnov test

Using ridge splines as witness functions in IPM framework

Applying neural optimization toolkits for distribution testing

🔎 Similar Papers

Beyond Calibration: Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence