🤖 AI Summary
This work investigates the generalization mechanisms of diffusion models in high-dimensional spaces, aiming to identify the sources of implicit regularization during training and sampling to mitigate data memorization. We introduce the novel concept of “score stability” and, for the first time, establish a generalization analysis framework grounded in algorithmic stability theory—bypassing conventional reliance on model architecture assumptions. By quantifying the sensitivity of denoising score matching, sampling trajectories, and SGD optimization to data perturbations, we derive a tight generalization error upper bound. Empirical validation confirms that diffusion-specific implicit regularization arises from techniques such as early stopping and coarse-grained discretization of sampling schedules. Our theoretical framework provides the first systematic foundation for understanding and improving the generalization capability of diffusion models.
📝 Abstract
The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data -- implying that some form of regularisation is essential for generalisation. Existing theoretical analyses primarily rely on algorithm-independent techniques such as uniform convergence, heavily utilising model structure to obtain generalisation bounds. In this work, we instead leverage the algorithmic aspects that promote generalisation in diffusion models, developing a general theory of algorithm-dependent generalisation for this setting. Borrowing from the framework of algorithmic stability, we introduce the notion of score stability, which quantifies the sensitivity of score-matching algorithms to dataset perturbations. We derive generalisation bounds in terms of score stability, and apply our framework to several fundamental learning settings, identifying sources of regularisation. In particular, we consider denoising score matching with early stopping (denoising regularisation), sampler-wide coarse discretisation (sampler regularisation) and optimising with SGD (optimisation regularisation). By grounding our analysis in algorithmic properties rather than model structure, we identify multiple sources of implicit regularisation unique to diffusion models that have so far been overlooked in the literature.