Diffusion Models Are Statistically Optimal for Learning Low-Dimensional Multi-Modal Distributions

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work investigates the statistical efficiency of diffusion models when learning high-dimensional multimodal distributions supported on multiple low-dimensional subspaces. Addressing the limitations of existing theoretical analyses—which often rely on strong assumptions such as smoothness, bounded density, or log-concavity—the paper establishes, for the first time, convergence guarantees for score-based diffusion models in the 1-Wasserstein distance under the mild assumption of sub-Gaussian noise. The key contribution is a sample complexity upper bound of $\widetilde{O}(\varepsilon^{-k \vee 2})$, where $k$ denotes the intrinsic dimensionality of the data. This bound is independent of the ambient dimension, thereby circumventing the curse of dimensionality and significantly improving upon prior results constrained by ambient-dimensional dependencies.

📝 Abstract

Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of their statistical efficiency remains limited. Existing theories typically rely on strong regularity assumptions, such as uniformly bounded densities or globally smooth score functions, which fail to capture such intrinsic structures. In this work, we study the sample complexity of diffusion models for learning distributions supported on a union of low-dimensional subspaces. Assuming that the data distribution within each subspace is subgaussian, we show that diffusion models require at most $\widetilde{O}(\varepsilon^{-k \vee 2})$ samples to achieve $\varepsilon$ error in 1-Wasserstein distance, where $k$ is the intrinsic dimension. This near-optimal convergence rate depends only on the intrinsic dimension and significantly improves upon prior theoretical guarantees that suffer from the curse of dimensionality. Notably, our analysis applies to a broad collection of distributions without imposing smoothness, bounded-density, or log-concavity assumptions. Overall, our results show that diffusion models can statistically adapt to intrinsic low-dimensional structure while naturally accommodating multi-modal data, offering a rigorous theoretical justification for their success in complex high-dimensional learning tasks.

Problem

Research questions and friction points this paper is trying to address.

diffusion models

low-dimensional structure

multi-modal distributions

statistical efficiency

sample complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models

statistical optimality

low-dimensional structure