🤖 AI Summary
Existing Langevin sampling algorithms lack rigorous non-asymptotic theoretical guarantees for high-dimensional probability distributions whose log-gradients exhibit superlinear growth (i.e., are non-Lipschitz).
Method: We propose a novel tamed Langevin dynamics algorithm, kTULA, integrating the tamed numerical discretization scheme with information-geometric analysis and non-convex optimization theory.
Contribution/Results: kTULA establishes, for the first time, a non-asymptotic convergence rate of order $2-varepsilon$ ($varepsilon>0$) in KL divergence—significantly improving upon prior $1$-order bounds—and yields tighter Wasserstein-2 error estimates. Theoretical analysis rigorously accommodates superlinear gradient growth without requiring global Lipschitz continuity. Empirical evaluations demonstrate kTULA’s effectiveness in sampling from high-dimensional double-well potentials and in posterior inference of neural network parameters. This work provides the first Langevin-type sampling framework for superlinear-gradient settings with provable non-asymptotic convergence guarantees.
📝 Abstract
Motivated by applications in deep learning, where the global Lipschitz continuity condition is often not satisfied, we examine the problem of sampling from distributions with super-linearly growing log-gradients. We propose a novel tamed Langevin dynamics-based algorithm, called kTULA, to solve the aforementioned sampling problem, and provide a theoretical guarantee for its performance. More precisely, we establish a non-asymptotic convergence bound in Kullback-Leibler (KL) divergence with the best-known rate of convergence equal to $2-overline{epsilon}$, $overline{epsilon}>0$, which significantly improves relevant results in existing literature. This enables us to obtain an improved non-asymptotic error bound in Wasserstein-2 distance, which can be used to further derive a non-asymptotic guarantee for kTULA to solve the associated optimization problems. To illustrate the applicability of kTULA, we apply the proposed algorithm to the problem of sampling from a high-dimensional double-well potential distribution and to an optimization problem involving a neural network. We show that our main results can be used to provide theoretical guarantees for the performance of kTULA.