🤖 AI Summary
This work addresses the high computational complexity of $O(N^2)$ in Mean Field Langevin dynamics for entropy-regularized distribution optimization by introducing kernel thinning—a technique previously unexplored in this context—to construct a sparse dynamical system that interacts only with a subset of $O(N^{1/2})$ particles. By integrating discretization of the McKean–Vlasov process, particle approximation, and maximum mean discrepancy analysis, the proposed method reduces the overall computational complexity to $O(N^{3/2})$ while preserving algorithmic convergence guarantees. Empirical evaluations across diverse tasks—including neural network training, distribution quantization, and Bayesian posterior inference—demonstrate the effectiveness of the approach, with experimental results aligning closely with the theoretical predictions.
📝 Abstract
Several important learning tasks can be formulated as minimizing an entropy-regularized objective over an appropriate space of probability distributions. Mean-field Langevin dynamics (MFLD) facilitate computation in this general context, casting the minimizer as the invariant distribution of a McKean--Vlasov process, which can be numerically discretized using $N$ particles and thus simulated. However, simulating this interacting particle system has computational complexity of order $N^2$. Motivated by recent research into \emph{kernel thinning}, we propose \texttt{KT-MFLD}, in which each particle interacts only with a thinned particle coreset of size $\mathcal{O}(N^{\frac{1}{2}})$. \texttt{KT-MFLD} thus reduces the computational complexity to order $N^{\frac{3}{2}}$ while, under mild regularity conditions, achieving the same convergence guarantees (up to logarithmic factors) as MFLD. Our theoretical analysis is empirically confirmed on tasks including the training of student-teacher neural networks, quantization with maximum mean discrepancy, and computation of predictively-oriented posteriors in a post-Bayesian framework.