🤖 AI Summary
This work investigates the mechanisms by which diffusion models generate highly realistic images that differ from their training data—referred to as “creativity”—and demonstrates that this capability stems from the alignment between the denoiser architecture and the target data distribution. Through theoretical analysis and empirical experiments, the study derives explicit forms of the generated distribution for linear, polynomial, and bottleneck-style denoisers for the first time, and systematically evaluates the behavior of various architectures, including UNet variants, throughout the diffusion process. The findings reveal that minor architectural modifications to the UNet significantly impact generation fidelity, thereby underscoring the critical role of the denoiser’s inductive bias and its alignment with the target distribution in determining model performance.
📝 Abstract
The creativity of diffusion models refers to their ability to generate highly realistic images that are different from their training data. Creativity is somewhat surprising since it is known that if the denoiser used in the diffusion model is the Bayes optimal denoiser for a given training set, then the model will simply copy the training samples. In this paper we present empirical and theoretical results that suggest that creativity in diffusion models is due to an interaction between the denoiser architecture and the target distribution. Theoretically, we give explicit forms for the distribution of generated samples as a function of the target distribution and the denoiser architecture for three different denoiser architectures (linear, polynomial, bottleneck). Empirically, we show that small changes in the popular UNET denoiser architecture leads to very different forms of creativity, and these small changes often yield samples that are highly nonrealistic. Taken together, our results show that diffusion models will only be successful if the inductive bias of the denoiser architecture is in strong alignment with the true target distribution.