🤖 AI Summary
This work addresses the lack of intrinsic convergence rate theory for score-based generative models on low-dimensional smooth manifolds embedded in high-dimensional spaces. The authors propose an analytical framework based on noise-scale decomposition, distinguishing between a tangent-space regime under large noise and a projected-center regime under small noise. They establish, for the first time, non-asymptotic Wasserstein-1 convergence rates that depend solely on the intrinsic manifold dimension $d$ rather than the ambient dimension, achieving the near-optimal sample complexity rate $\widetilde{O}(n^{-(\beta+1)/(d+2\beta)})$ for Hölder-smooth densities on compact manifolds. By leveraging a finite set of intrinsic anchor points, explicit nearest-point projections, and ReLU networks—combined with variance-preserving estimators—the method ensures polynomial dependence of network parameters on the ambient dimension while enabling geometry-aware, sample-efficient generative modeling.
📝 Abstract
Score-based generative models are trained in high-dimensional ambient spaces, yet many data distributions are supported on low-dimensional nonlinear structures. We prove that, for compact $d$-dimensional smooth manifolds $\mathcal{M} \subset [0,1]^D$ with $d > 2$ and $β$-Hölder densities strictly positive on $\mathcal{M}$, a variance-preserving SGM estimator attains the intrinsic Wasserstein--1 sample exponent $\tilde{\mathcal{O}}(D^{\mathcal{O}_β(d)}n^{-(β+1)/(d+2β)})$, up to logarithmic factors and explicit geometry and density factors. The full nonasymptotic bound explicitly isolates the finite-order geometry envelope, Hölder radius, density lower bound, ambient dependence, and finite-order correction terms. The analysis separates score approximation into a large-noise tangent-cell regime and a small-noise projection-centered, de-Gaussianized Laplace regime. The key technical ingredient is a ReLU implementation of nearest-projection coordinates via finite intrinsic anchors and Gauss--Newton iterations, rather than approximating the manifold projection as a black-box high-dimensional smooth map. Consequently, for families with polynomially controlled geometry and density lower bounds, the constructed score-network parameters have polynomial ambient dependence.