Kernel Selection is Model Selection: A Unified Complexity-Penalized Approach for MMD Two-Sample Tests

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the critical dependence of Maximum Mean Discrepancy (MMD) two-sample test power on kernel selection, a challenge exacerbated by existing data-driven approaches that either overfit due to violations of the i.i.d. assumption or fail to scale to continuous kernel spaces. The paper pioneers a rigorous formulation of kernel selection as a model selection problem and introduces the Complexity-Penalized MMD (CP-MMD) criterion. By deriving a complexity penalty from uniform concentration inequalities for two-sample statistics, CP-MMD seamlessly integrates into the optimization objective, enabling direct tuning of continuous kernel parameters—such as bandwidths, polynomial features, or even deep network weights—without requiring grid search. The method maintains strict Type I error control while achieving or surpassing state-of-the-art test power across diverse experimental settings.

📝 Abstract

The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain distributions, the kernel must be dynamically optimized. However, data-driven optimization violates the foundational i.i.d. assumption, forcing a strict trade-off in existing frameworks. Ratio criteria ignore this dependence, inducing overfitting and variance collapse on rich kernel classes. Conversely, aggregation methods bypass the dependence using finite grids, but this strategy cannot scale to continuous search spaces like deep kernels. To break this dichotomy, we establish data-driven kernel selection as a model selection problem. We propose Complexity-Penalized MMD (CP-MMD), a criterion derived by applying the two-sample uniform concentration inequality of preceding works to the post-optimization MMD problem. The resulting penalty bounds the empirical MMD by the complexity of the kernel search space, mathematically absorbing the cost of optimization, so that CP-MMD enables direct, grid-free maximization over continuous parametric classes, including scalar bandwidths, polynomial feature bandwidths, and deep network parameters. By formally accounting for optimization complexity, we prove that CP-MMD maximizes true test power while ensuring unconditional Type-I validity. Consequently, CP-MMD enables grid-free kernel selection across linear, polynomial-feature, and deep regimes, matching or exceeding state-of-the-art test power.

Problem

Research questions and friction points this paper is trying to address.

Kernel Selection

Model Selection

Maximum Mean Discrepancy

Two-Sample Test

Data-Driven Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum Mean Discrepancy

kernel selection

model selection