๐ค AI Summary
This work addresses the lack of theoretical characterization of generalization in diffusion models. We systematically investigate the generalization error mechanisms under various training dynamics, particularly revealing how mode shifts in the target distribution impair generalization. By integrating score-matching analysis, stochastic process theory, and generalization error decomposition, we establishโfor the first timeโa training-dynamic-coupled generalization upper bound $O(n^{-2/5} + m^{-4/5})$ that exhibits non-exponential dependence on dimensionality, thereby overcoming the classical curse of dimensionality. We theoretically prove that early stopping induces polynomial decay of the estimation error, with convergence rates explicitly improving in both sample size $n$ and model capacity $m$. Numerical experiments validate the tightness and practical efficacy of the bound. Our results provide interpretable, quantitative guidance for the theoretical understanding, architectural design, and training protocol optimization of diffusion models.
๐ Abstract
Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models. We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models, suggesting a polynomially small generalization error ($O(n^{-2/5}+m^{-4/5})$) on both the sample size $n$ and the model capacity $m$, evading the curse of dimensionality (i.e., not exponentially large in the data dimension) when early-stopped. Furthermore, we extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities with progressively increasing distances between modes. This precisely elucidates the adverse effect of"modes shift"in ground truths on the model generalization. Moreover, these estimates are not solely theoretical constructs but have also been confirmed through numerical simulations. Our findings contribute to the rigorous understanding of diffusion models' generalization properties and provide insights that may guide practical applications.