🤖 AI Summary
Diffusion models exhibit strong generalization capabilities even before fully memorizing their training data, challenging the conventional theoretical assumption that memorization and generalization are inherently at odds. This work shifts the focus from “why models do not memorize” to “what is learned during the pre-memorization phase,” proposing a new theoretical perspective that integrates model capacity, optimization-induced implicit regularization, and architectural inductive biases. Empirical analysis on CIFAR-10 demonstrates that diffusion models acquire essential structural features of the data well before memorization occurs, revealing a distinctive generalization mechanism. These findings lay the groundwork for developing novel generalization theories tailored to generative models and highlight several critical open questions to guide future breakthroughs in the field.
📝 Abstract
This position paper argues that understanding generalization in diffusion models requires fundamentally new theoretical frameworks that go beyond both classical statistical learning theory and the benign overfitting paradigm developed for supervised learning. In diffusion models, unlike in supervised learning, memorization of training data and generalization to novel samples are incompatible: a model that has fully memorized its training set generates copies rather than novel data. Several theoretical explanations for why practical diffusion models nevertheless generalize have been proposed, based on capacity limitations, implicit regularization from optimization, or architectural inductive biases, but their interactions remain unclear. We argue that the field should pivot from explaining why the diffusion models do not memorize to investigating what the model actually learns during pre-memorization phase. To highlight our stance, we conduct empirical study of diffusion models trained on CIFAR-10, and we distill the findings into concrete open questions that we believe are key to improve understanding of generalization in diffusion models.