(De)-regularized Maximum Mean Discrepancy Gradient Flow

📅 2024-09-23
🏛️ arXiv.org
📈 Citations: 10
Influential: 0
📄 PDF
🤖 AI Summary
Existing gradient flow methods for source-to-target distribution transport face a fundamental trade-off: f-divergence-based flows lack numerical tractability, while MMD-based flows require strong assumptions—such as explicit noise injection—to ensure convergence. This work proposes DrMMD, a tractable and robust gradient flow method that operates solely on target samples. Its core innovation is the first-established tunable de-regularized linkage between MMD and the χ²-divergence, unifying near-global convergence guarantees with closed-form sample update rules. DrMMD integrates de-regularized kernel MMD, the Wasserstein gradient flow framework, and an adaptive scheduling strategy, ensuring theoretical convergence for general target distributions in both continuous- and discrete-time settings. Extensive experiments on large-scale teacher–student neural networks validate its effectiveness, robustness, and scalability.

Technology Category

Application Category

📝 Abstract
We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either lack tractable numerical implementation ($f$-divergence flows) or require strong assumptions, and modifications such as noise injection, to ensure convergence (Maximum Mean Discrepancy flows). In contrast, DrMMD flow can simultaneously (i) guarantee near-global convergence for a broad class of targets in both continuous and discrete time, and (ii) be implemented in closed form using only samples. The former is achieved by leveraging the connection between the DrMMD and the $chi^2$-divergence, while the latter comes by treating DrMMD as MMD with a de-regularized kernel. Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $chi^2$ regime. The potential application of the DrMMD flow is demonstrated across several numerical experiments, including a large-scale setting of training student/teacher networks.
Problem

Research questions and friction points this paper is trying to address.

Develops gradient flow for transporting source to target distributions
Ensures convergence for broad target classes with sample-based implementation
Uses adaptive de-regularization to balance discretization errors and divergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces de-regularized MMD for improved gradient flow
Guarantees near-global convergence using χ²-divergence connection
Implements closed-form solution with adaptive de-regularization schedule
🔎 Similar Papers
No similar papers found.
Z
Zonghao Chen
Department of Computer Science, University College London, London, WC1V 6LJ, UK
A
Aratrika Mustafi
Department of Statistics, Pennsylvania State University, University Park, PA, 16802 USA
Pierre Glaser
Pierre Glaser
Gatsby Computational Neuroscience Unit, UCL
Machine Learning
Anna Korba
Anna Korba
ENSAE/CREST
Machine Learning
A
A. Gretton
Gatsby Computational Neuroscience Unit, University College London, London, WC1V 6LJ, UK
B
Bharath K. Sriperumbudur
Department of Statistics, Pennsylvania State University, University Park, PA, 16802 USA