A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work studies the convergence rate of the KL divergence for diffusion model sampling under minimal assumptions—specifically, without requiring any smoothness of the target density. The method decomposes the reverse-time sampling process into alternating steps of the probability flow ODE and small-noise perturbations, and introduces a novel noise-injection mechanism that enables controlled conversion from Wasserstein error to KL divergence—even in nonsmooth settings. Theoretically, it is shown that only $ ilde{O}(dlog^{3/2}(1/delta)/varepsilon)$ discretization steps suffice to achieve KL divergence $O(varepsilon^2)$ to a Gaussian-perturbed target distribution, improving upon the prior best-known bound of $ ilde{O}(dlog^2(1/delta)/varepsilon^2)$. The key contribution is the first dimension-linear KL convergence guarantee for diffusion sampling that entirely dispenses with smoothness assumptions on the target density.

Technology Category

Application Category

📝 Abstract

Diffusion-based generative models have emerged as highly effective methods for synthesizing high-quality samples. Recent works have focused on analyzing the convergence of their generation process with minimal assumptions, either through reverse SDEs or Probability Flow ODEs. The best known guarantees, without any smoothness assumptions, for the KL divergence so far achieve a linear dependence on the data dimension $d$ and an inverse quadratic dependence on $varepsilon$. In this work, we present a refined analysis that improves the dependence on $varepsilon$. We model the generation process as a composition of two steps: a reverse ODE step, followed by a smaller noising step along the forward process. This design leverages the fact that the ODE step enables control in Wasserstein-type error, which can then be converted into a KL divergence bound via noise addition, leading to a better dependence on the discretization step size. We further provide a novel analysis to achieve the linear $d$-dependence for the error due to discretizing this Probability Flow ODE in absence of any smoothness assumptions. We show that $ ilde{O}left( frac{dlog^{3/2}(frac{1}δ)}{varepsilon} ight)$ steps suffice to approximate the target distribution corrupted with Gaussian noise of variance $δ$ within $O(varepsilon^2)$ in KL divergence, improving upon the previous best result, requiring $ ilde{O}left( frac{dlog^2(frac{1}δ)}{varepsilon^2} ight)$ steps.

Problem

Research questions and friction points this paper is trying to address.

Improving KL divergence convergence rate for diffusion models

Reducing discretization step dependence on error tolerance

Achieving linear dimension dependence without smoothness assumptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Refined KL-divergence analysis with improved epsilon dependence

Two-step generation process combining reverse ODE and noising

Novel discretization analysis achieving linear dimension dependence

🔎 Similar Papers

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis