🤖 AI Summary
This work investigates the minimal sample complexity required for training diffusion models—i.e., the number of samples needed to achieve high-accuracy learning under the assumption that the neural network possesses sufficient expressive capacity. Methodologically, it introduces the first analysis establishing a logarithmic (rather than exponential) dependence of the Wasserstein approximation error on network depth; this is achieved by unifying total variation and Wasserstein metrics within a framework integrating probabilistic distance analysis, generalization error bounds, and reverse diffusion modeling theory. The main contributions are: (i) the tightest polynomial upper bound on sample complexity for diffusion model learning to date; (ii) significantly improved dependencies of the bound on key parameters—including Wasserstein error tolerance, data dimensionality, and network depth; and (iii) strengthened theoretical guarantees for the learnability of diffusion models.
📝 Abstract
Diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. From a theoretical standpoint, a number of recent works~cite{chen2022,chen2022improved,benton2023linear} have studied the iteration complexity of sampling, assuming access to an accurate diffusion model. In this work, we focus on understanding the emph{sample complexity} of training such a model; how many samples are needed to learn an accurate diffusion model using a sufficiently expressive neural network? Prior work~cite{BMR20} showed bounds polynomial in the dimension, desired Total Variation error, and Wasserstein error. We show an emph{exponential improvement} in the dependence on Wasserstein error and depth, along with improved dependencies on other relevant parameters.