🤖 AI Summary
Prior theoretical analyses of diffusion model training suffer from either strong assumptions—such as exact access to the empirical risk minimizer (ERM)—or exponential dependence on dimensionality, rendering them inapplicable to high-dimensional settings.
Method: We develop the first rigorous sample complexity framework for score estimation *without* requiring ERM access. Our approach leverages a structured error decomposition (statistical, approximation, and optimization components), novel non-convex score matching theory, and tight generalization error bounds. Crucially, we decouple neural network parameterization from dimensional scaling.
Contribution/Results: We establish a tight $ ilde{O}(varepsilon^{-6})$ sample complexity upper bound for score estimation, eliminating the exponential dependence on dimension that plagues prior work. This yields the first verifiable, assumption-light, and dimensionally favorable theoretical guarantee for high-dimensional diffusion model training—significantly advancing the theoretical foundations of generative modeling.
📝 Abstract
Diffusion models have demonstrated state-of-the-art performance across vision, language, and scientific domains. Despite their empirical success, prior theoretical analyses of the sample complexity suffer from poor scaling with input data dimension or rely on unrealistic assumptions such as access to exact empirical risk minimizers. In this work, we provide a principled analysis of score estimation, establishing a sample complexity bound of $widetilde{mathcal{O}}(epsilon^{-6})$. Our approach leverages a structured decomposition of the score estimation error into statistical, approximation, and optimization errors, enabling us to eliminate the exponential dependence on neural network parameters that arises in prior analyses. It is the first such result which achieves sample complexity bounds without assuming access to the empirical risk minimizer of score function estimation loss.