🤖 AI Summary
Diffusion models generally lack explicit regularization in the latent space, causing their development to diverge from advances in representation learning.
Method: We propose a plug-and-play Dispersive Loss that explicitly encourages latent representation discretization without requiring positive sample pairs, modifying the sampling procedure, introducing auxiliary parameters, or relying on pretraining. This loss constitutes a self-contained, dependency-free geometric regularization mechanism—distinct from contrastive learning—and avoids the inherent conflict between positive-pair construction and generation consistency.
Results: On ImageNet, Dispersive Loss consistently improves FID and Inception Score across multiple diffusion model architectures, robustly outperforming strong baselines. It is the first method to empirically validate the effectiveness of joint optimization between generative modeling and representation learning, demonstrating substantial gains in both sample quality and latent-space structure.
📝 Abstract
The development of diffusion-based generative models over the past decade has largely proceeded independently of progress in representation learning. These diffusion models typically rely on regression-based objectives and generally lack explicit regularization. In this work, we propose extit{Dispersive Loss}, a simple plug-and-play regularizer that effectively improves diffusion-based generative models. Our loss function encourages internal representations to disperse in the hidden space, analogous to contrastive self-supervised learning, with the key distinction that it requires no positive sample pairs and therefore does not interfere with the sampling process used for regression. Compared to the recent method of representation alignment (REPA), our approach is self-contained and minimalist, requiring no pre-training, no additional parameters, and no external data. We evaluate Dispersive Loss on the ImageNet dataset across a range of models and report consistent improvements over widely used and strong baselines. We hope our work will help bridge the gap between generative modeling and representation learning.