🤖 AI Summary
Diffusion models unexpectedly generate cartoonized or blurry images—nonexistent in training data—within high-density regions of the learned distribution.
Method: We propose Mode Tracking Theory to precisely localize modes in the diffusion denoising distribution; design a zero-overhead SDE likelihood tracking method that estimates and optimizes sample likelihood without additional computation; and develop an efficient high-density sampler that targets atypical, high-likelihood samples overlooked by conventional samplers.
Results: Experiments demonstrate substantial improvement in sampling likelihood, stable generation of cartoon/blurry high-density images, and faithful reproduction of this phenomenon on purely real-image datasets—revealing an intrinsic, implicit structural bias inherent to diffusion models.
📝 Abstract
We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost.