🤖 AI Summary
This work studies the convergence rate of the KL divergence for diffusion model sampling under minimal assumptions—specifically, without requiring any smoothness of the target density. The method decomposes the reverse-time sampling process into alternating steps of the probability flow ODE and small-noise perturbations, and introduces a novel noise-injection mechanism that enables controlled conversion from Wasserstein error to KL divergence—even in nonsmooth settings. Theoretically, it is shown that only $ ilde{O}(dlog^{3/2}(1/delta)/varepsilon)$ discretization steps suffice to achieve KL divergence $O(varepsilon^2)$ to a Gaussian-perturbed target distribution, improving upon the prior best-known bound of $ ilde{O}(dlog^2(1/delta)/varepsilon^2)$. The key contribution is the first dimension-linear KL convergence guarantee for diffusion sampling that entirely dispenses with smoothness assumptions on the target density.
📝 Abstract
Diffusion-based generative models have emerged as highly effective methods for synthesizing high-quality samples. Recent works have focused on analyzing the convergence of their generation process with minimal assumptions, either through reverse SDEs or Probability Flow ODEs. The best known guarantees, without any smoothness assumptions, for the KL divergence so far achieve a linear dependence on the data dimension $d$ and an inverse quadratic dependence on $varepsilon$. In this work, we present a refined analysis that improves the dependence on $varepsilon$. We model the generation process as a composition of two steps: a reverse ODE step, followed by a smaller noising step along the forward process. This design leverages the fact that the ODE step enables control in Wasserstein-type error, which can then be converted into a KL divergence bound via noise addition, leading to a better dependence on the discretization step size. We further provide a novel analysis to achieve the linear $d$-dependence for the error due to discretizing this Probability Flow ODE in absence of any smoothness assumptions. We show that $ ilde{O}left( frac{dlog^{3/2}(frac{1}δ)}{varepsilon}
ight)$ steps suffice to approximate the target distribution corrupted with Gaussian noise of variance $δ$ within $O(varepsilon^2)$ in KL divergence, improving upon the previous best result, requiring $ ilde{O}left( frac{dlog^2(frac{1}δ)}{varepsilon^2}
ight)$ steps.