🤖 AI Summary
This work investigates whether generative models truly generalize during training or merely memorize training data, and clarifies the relationship between convergence and the recovery of latent factors. By analytically studying linear generative models combined with power-law spectral data assumptions, the study reveals that generalization comprises two distinct objectives: matching the bulk of the data distribution and recovering the dominant latent factors. The former drives smooth, continuous convergence, whereas the latter exhibits a sharp, discontinuous transition and is largely insensitive to convergence dynamics. These theoretical insights are validated in convolutional denoising diffusion models, offering the first precise characterization of the phase transition from memorization to genuine generalization.
📝 Abstract
Generative neural networks learn how to produce highly realistic images from a large, but finite number of examples - or do they simply memorise their training set? To settle this question, Kadkhodaie, Guth, Simoncelli and Mallat (ICLR '24) trained diffusion models independently on disjoint subsets of a dataset and showed that they converge to nearly the same density when the number of training images is large enough. This result raises two basic questions: how much data do you need for convergence, and what does convergence capture about learning the data distribution? Here, we address these questions by providing an exact analytical characterisation of the transition from memorisation to generalisation in linear generative models. We find that these models memorise at small load, while convergence emerges continuously when the number of samples is linear in the input dimension. Strikingly, we find that convergence is insensitive to recovery of the principal latent factors of the data, which are recovered in a sharp transition. After extending our approach to data with power-law spectra, we find the same distinction between convergence and latent recovery in our experiments with convolutional denoisers and in the data of Kadkhodaie et al. We thus show that generalisation in generative models decomposes into at least two distinct objectives: matching the bulk of the data distribution and recovering the principal latent factors. These objectives correspond to two different distances between true and learnt data distribution, and only the first one is captured by convergence.