When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the convergence mechanisms of probability flows in diffusion models—specifically, when such flows converge to training samples, convex combinations thereof, or novel points interior to the data manifold—to unify the coexistence of memorization and generalization. We propose a simplified score-flow analysis framework leveraging orthogonal datasets and obtuse simplicial structures, combined with shallow ReLU denoisers, minimum ℓ²-norm constrained training, and controllable diffusion time scheduling, enabling systematic characterization of probability flow trajectories. Theoretically and empirically, we identify three distinct convergence regimes; prove that early stopping suppresses overfitting and encourages manifold-interior generation; and demonstrate a transition from memorization-dominated to manifold-generalizing behavior as sample size increases. Crucially, we provide the first characterization of the essential role of shallow networks in diffusion modeling through the lens of inductive bias.

Technology Category

Application Category

📝 Abstract
While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold. We analyze this by studying the probability flow of shallow ReLU neural network denoisers trained with minimal $ell^2$ norm. For intuition, we introduce a simpler score flow and show that for orthogonal datasets, both flows follow similar trajectories, converging to a training point or a sum of training points. However, early stopping by the diffusion time scheduler allows probability flow to reach more general manifold points. This reflects the tendency of diffusion models to both memorize training samples and generate novel points that combine aspects of multiple samples, motivating our study of such behavior in simplified settings. We extend these results to obtuse simplex data and, through simulations in the orthogonal case, confirm that probability flow converges to a training point, a sum of training points, or a manifold point. Moreover, memorization decreases when the number of training samples grows, as fewer samples accumulate near training points.
Problem

Research questions and friction points this paper is trying to address.

Understanding when probability flow converges to training samples
Analyzing behavior of shallow ReLU neural network denoisers
Exploring memorization and generalization in diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shallow ReLU neural network denoisers
Probability flow with minimal l2 norm
Early stopping by diffusion scheduler
🔎 Similar Papers
No similar papers found.