🤖 AI Summary
This work addresses the lack of theoretical understanding regarding how statistical complexity and geometric structure affect diffusion models when applied to high-dimensional data residing on low-dimensional manifolds. Modeling data as samples from a smooth Riemannian manifold, the study reveals an intrinsic–extrinsic decomposition of the score function across varying noise levels and quantifies the influence of manifold curvature on this decomposition. Building on these insights, the authors develop a neural network approximation scheme tailored to the manifold’s geometry and establish, for the first time, statistical convergence rates for score estimation and distribution learning that explicitly depend on the intrinsic dimensionality and curvature of the manifold. This provides a rigorous theoretical foundation for diffusion-based generative modeling on manifolds.
📝 Abstract
Diffusion models have become a leading framework in generative modeling, yet their theoretical understanding -- especially for high-dimensional data concentrated on low-dimensional structures -- remains incomplete. This paper investigates how diffusion models learn such structured data, focusing on two key aspects: statistical complexity and influence of data geometric properties. By modeling data as samples from a smooth Riemannian manifold, our analysis reveals crucial decompositions of score functions in diffusion models under different levels of injected noise. We also highlight the interplay of manifold curvature with the structures in the score function. These analyses enable an efficient neural network approximation to the score function, built upon which we further provide statistical rates for score estimation and distribution learning. Remarkably, the obtained statistical rates are governed by the intrinsic dimension of data and the manifold curvature. These results advance the statistical foundations of diffusion models, bridging theory and practice for generative modeling on manifolds.