A theory of learning data statistics in diffusion models, from easy to hard

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

How diffusion models progressively learn statistical structures—from simple to complex—remains poorly understood. This work addresses this gap by integrating a cumulant-based mixture model with a simplified denoiser and leveraging statistical learning theory to reveal that diffusion models first prioritize pairwise statistics before gradually capturing higher-order correlations. We introduce an invariant, the “diffusion information index,” and provide the first theoretical proof that pairwise statistics are learnable with linear sample complexity, whereas fourth-order cumulants generally require cubic complexity—but can be reduced to linear under shared latent structure. These findings elucidate diffusion models’ inherent preference for distributional simplicity and rigorously quantify how latent correlation structures mitigate the sample complexity of learning high-order dependencies.

Technology Category

Application Category

📝 Abstract

While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.

Problem

Research questions and friction points this paper is trying to address.

diffusion models

learning dynamics

statistical learning

sample complexity

higher-order correlations

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models

sample complexity

cumulants