Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Why do diffusion models generalize well and resist overfitting? This work uncovers a critical phase transition between generalization and memorization in their training dynamics: two distinct timescales emerge—early-stage generation time τ_gen (constant) for producing high-fidelity samples, and late-stage memorization time τ_mem (scaling linearly with dataset size n). This yields an expanding generalization window as n increases. We establish, for the first time, an implicit dynamical regularization in diffusion models: overparameterized architectures avoid memorization solely by controlling training duration—no explicit regularizers or architectural constraints are needed. Our methodology integrates empirical U-Net analysis, high-dimensional random feature theory, and large-scale numerical simulations, validating τ_mem ∝ n on both synthetic and real data. Theory and experiments jointly demonstrate that, beyond a model-dependent threshold of n, overfitting vanishes entirely under infinite training.

Technology Category

Application Category

📝 Abstract
Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this work, we investigate the role of the training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify two distinct timescales: an early time $ au_mathrm{gen}$ at which models begin to generate high-quality samples, and a later time $ au_mathrm{mem}$ beyond which memorization emerges. Crucially, we find that $ au_mathrm{mem}$ increases linearly with the training set size $n$, while $ au_mathrm{gen}$ remains constant. This creates a growing window of training times with $n$ where models generalize effectively, despite showing strong memorization if training continues beyond it. It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times. These findings reveal a form of implicit dynamical regularization in the training dynamics, which allow to avoid memorization even in highly overparameterized settings. Our results are supported by numerical experiments with standard U-Net architectures on realistic and synthetic datasets, and by a theoretical analysis using a tractable random features model studied in the high-dimensional limit.
Problem

Research questions and friction points this paper is trying to address.

Understanding mechanisms preventing diffusion models from memorizing training data
Investigating training dynamics in transition from generalization to memorization
Identifying timescales for generalization and memorization in diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies two distinct training timescales
Linear increase in memorization time with dataset size
Implicit dynamical regularization prevents overfitting
🔎 Similar Papers
No similar papers found.
T
Tony Bonnaire
Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris Cité, F-75005 Paris, France
Giulio Biroli
Giulio Biroli
Professor of Theoretical Physics, ENS Paris
Statistical PhysicsCondensed MatterComplex Systems
M
Marc Mézard
Department of Computing Sciences, Bocconi University, Milano, Italy