(De)-regularized Maximum Mean Discrepancy Gradient Flow

📅 2024-09-23

🏛️ arXiv.org

📈 Citations: 10

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Existing gradient flow methods for source-to-target distribution transport face a fundamental trade-off: f-divergence-based flows lack numerical tractability, while MMD-based flows require strong assumptions—such as explicit noise injection—to ensure convergence. This work proposes DrMMD, a tractable and robust gradient flow method that operates solely on target samples. Its core innovation is the first-established tunable de-regularized linkage between MMD and the χ²-divergence, unifying near-global convergence guarantees with closed-form sample update rules. DrMMD integrates de-regularized kernel MMD, the Wasserstein gradient flow framework, and an adaptive scheduling strategy, ensuring theoretical convergence for general target distributions in both continuous- and discrete-time settings. Extensive experiments on large-scale teacher–student neural networks validate its effectiveness, robustness, and scalability.

Technology Category

Application Category

📝 Abstract

We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either lack tractable numerical implementation ($f$-divergence flows) or require strong assumptions, and modifications such as noise injection, to ensure convergence (Maximum Mean Discrepancy flows). In contrast, DrMMD flow can simultaneously (i) guarantee near-global convergence for a broad class of targets in both continuous and discrete time, and (ii) be implemented in closed form using only samples. The former is achieved by leveraging the connection between the DrMMD and the $chi^2$-divergence, while the latter comes by treating DrMMD as MMD with a de-regularized kernel. Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $chi^2$ regime. The potential application of the DrMMD flow is demonstrated across several numerical experiments, including a large-scale setting of training student/teacher networks.

Problem

Research questions and friction points this paper is trying to address.

Develops gradient flow for transporting source to target distributions

Ensures convergence for broad target classes with sample-based implementation

Uses adaptive de-regularization to balance discretization errors and divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces de-regularized MMD for improved gradient flow

Guarantees near-global convergence using χ²-divergence connection

Implements closed-form solution with adaptive de-regularization schedule

🔎 Similar Papers

No similar papers found.