🤖 AI Summary
This paper challenges the prevailing view that score-based methods (e.g., diffusion models) learn the full data distribution in the low-noise limit, arguing instead that their success fundamentally stems from implicit learning of the data manifold’s geometric structure. Method: Through small-noise asymptotic analysis, we identify a Θ(σ⁻²) scale separation in the score function: manifold geometry governs leading-order behavior, while distributional details contribute only to higher-order terms. Contribution/Results: We prove that recovering the support (i.e., the manifold) requires only o(σ⁻²) score estimation error; moreover, uniform distributions or maximum-entropy priors tolerate errors up to O(σ⁻²). Integrating diffusion modeling with Bayesian inverse problem theory, we empirically validate manifold-learning dominance on large-scale models—including Stable Diffusion—thereby shifting the paradigm from “distribution fitting” to “manifold learning.”
📝 Abstract
Score-based methods, such as diffusion models and Bayesian inverse problems, are often interpreted as learning the data distribution in the low-noise limit ($σ o 0$). In this work, we propose an alternative perspective: their success arises from implicitly learning the data manifold rather than the full distribution. Our claim is based on a novel analysis of scores in the small-$σ$ regime that reveals a sharp separation of scales: information about the data manifold is $Θ(σ^{-2})$ stronger than information about the distribution. We argue that this insight suggests a paradigm shift from the less practical goal of distributional learning to the more attainable task of geometric learning, which provably tolerates $O(σ^{-2})$ larger errors in score approximation. We illustrate this perspective through three consequences: i) in diffusion models, concentration on data support can be achieved with a score error of $o(σ^{-2})$, whereas recovering the specific data distribution requires a much stricter $o(1)$ error; ii) more surprisingly, learning the uniform distribution on the manifold-an especially structured and useful object-is also $O(σ^{-2})$ easier; and iii) in Bayesian inverse problems, the maximum entropy prior is $O(σ^{-2})$ more robust to score errors than generic priors. Finally, we validate our theoretical findings with preliminary experiments on large-scale models, including Stable Diffusion.