🤖 AI Summary
This work investigates the learning behavior of diffusion models on data supported on low-dimensional manifolds, with a focus on the relationship between sample complexity and manifold geometry in denoising score matching. By parameterizing the score function using random feature neural networks and leveraging high-dimensional asymptotic analysis together with manifold learning theory, the study derives, for the first time, closed-form expressions for training error, test error, and score estimation error, thereby characterizing precise learning curves. The key finding is that on linear manifolds, sample complexity depends linearly only on the intrinsic dimensionality, whereas this advantage markedly diminishes on nonlinear manifolds, revealing the subtle yet critical influence of manifold curvature on learning performance.
📝 Abstract
We study the theoretical behavior of denoising score matching--the learning task associated to diffusion models--when the data distribution is supported on a low-dimensional manifold and the score is parameterized using a random feature neural network. We derive asymptotically exact expressions for the test, train, and score errors in the high-dimensional limit. Our analysis reveals that, for linear manifolds the sample complexity required to learn the score function scales linearly with the intrinsic dimension of the manifold, rather than with the ambient dimension. Perhaps surprisingly, the benefits of low-dimensional structure starts to diminish once we have a non-linear manifold. These results indicate that diffusion models can benefit from structured data; however, the dependence on the specific type of structure is subtle and intricate.