🤖 AI Summary
This work investigates the memory–generalization phase transition in diffusion models under varying training data scales. We propose a *correlational memory* perspective: training corresponds to memory encoding, while generation implements memory retrieval. We establish, for the first time, a theoretical connection between diffusion models and Hopfield networks, deriving necessary and sufficient conditions for the emergence of *spurious attractors*—hallucinated states—at the critical memory load threshold. Leveraging energy landscape analysis, dynamical systems modeling, and empirical validation on DDPM and DDIM, we confirm the universality of this phenomenon. Results show that models operate dominantly in memory mode under small-data regimes, shift toward generalization with large-scale data, and exhibit spurious attractors in the critical regime—unifying explanations for memory overload and implicit manifold learning. This work provides a cross-disciplinary theoretical framework and falsifiable predictions for understanding the intrinsic mechanisms of diffusion models.
📝 Abstract
Hopfield networks are associative memory (AM) systems, designed for storing and retrieving patterns as local minima of an energy landscape. In the classical Hopfield model, an interesting phenomenon occurs when the amount of training data reaches its critical memory load $- spurious,,states$, or unintended stable points, emerge at the end of the retrieval dynamics, leading to incorrect recall. In this work, we examine diffusion models, commonly used in generative modeling, from the perspective of AMs. The training phase of diffusion model is conceptualized as memory encoding (training data is stored in the memory). The generation phase is viewed as an attempt of memory retrieval. In the small data regime the diffusion model exhibits a strong memorization phase, where the network creates distinct basins of attraction around each sample in the training set, akin to the Hopfield model below the critical memory load. In the large data regime, a different phase appears where an increase in the size of the training set fosters the creation of new attractor states that correspond to manifolds of the generated samples. Spurious states appear at the boundary of this transition and correspond to emergent attractor states, which are absent in the training set, but, at the same time, have distinct basins of attraction around them. Our findings provide: a novel perspective on the memorization-generalization phenomenon in diffusion models via the lens of AMs, theoretical prediction of existence of spurious states, empirical validation of this prediction in commonly-used diffusion models.