Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of tight generalization guarantees for score-based diffusion models under finite-sample settings. It introduces the $(p,q)$-Wasserstein dimension to characterize the intrinsic low-dimensional structure of data, without requiring assumptions such as compact support, manifold structure, or smooth density. By analyzing finite-sample errors in Wasserstein-$p$ distance under only finite moment conditions, the paper establishes a theoretical convergence guarantee for diffusion models learning an unknown data distribution. Integrating score matching, forward diffusion modeling, and neural network approximation, it proves that the expected Wasserstein-$p$ error converges at a rate of $\widetilde{O}(n^{-1/d^*_{p,q}(\mu)})$, where $d^*_{p,q}(\mu)$ denotes the $(p,q)$-Wasserstein dimension of the data distribution. This rate significantly improves upon existing bounds that depend on the ambient dimension, thereby effectively mitigating the curse of dimensionality.

Technology Category

Application Category

📝 Abstract
Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution $μ$ from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-$p$ distance. Unlike prior results, our guarantees hold for all $p \ge 1$ and require only a finite-moment assumption on $μ$, without compact-support, manifold, or smooth-density conditions. Specifically, given $n$ i.i.d.\ samples from $μ$ with finite $q$-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-$p$ error between the learned distribution $\hatμ$ and $μ$ scales as $\mathbb{E}\, \mathbb{W}_p(\hatμ,μ) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(μ)}\right),$ where $d^\ast_{p,q}(μ)$ is the $(p,q)$-Wasserstein dimension of $μ$. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(μ)$ rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed $(p,q)$-Wasserstein dimension also extends classical Wasserstein dimension notions to distributions with unbounded support, which may be of independent theoretical interest.
Problem

Research questions and friction points this paper is trying to address.

score-based diffusion models
statistical generalization
intrinsic low-dimensional structure
Wasserstein distance
curse of dimensionality
Innovation

Methods, ideas, or system contributions that make the work stand out.

score-based diffusion models
Wasserstein dimension
statistical convergence
intrinsic low-dimensional structure
curse of dimensionality
🔎 Similar Papers
No similar papers found.
S
Saptarshi Chakraborty
Department of Statistics, University of Michigan
Quentin Berthet
Quentin Berthet
Google DeepMind, Paris
Machine learningStatisticsOptimization
P
Peter L. Bartlett
Department of Statistics, University of California, Berkeley; Department of Electrical Engineering and Computer Sciences, UC Berkeley; Google DeepMind