When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper challenges the prevailing view that score-based methods (e.g., diffusion models) learn the full data distribution in the low-noise limit, arguing instead that their success fundamentally stems from implicit learning of the data manifold’s geometric structure. Method: Through small-noise asymptotic analysis, we identify a Θ(σ⁻²) scale separation in the score function: manifold geometry governs leading-order behavior, while distributional details contribute only to higher-order terms. Contribution/Results: We prove that recovering the support (i.e., the manifold) requires only o(σ⁻²) score estimation error; moreover, uniform distributions or maximum-entropy priors tolerate errors up to O(σ⁻²). Integrating diffusion modeling with Bayesian inverse problem theory, we empirically validate manifold-learning dominance on large-scale models—including Stable Diffusion—thereby shifting the paradigm from “distribution fitting” to “manifold learning.”

Technology Category

Application Category

📝 Abstract
Score-based methods, such as diffusion models and Bayesian inverse problems, are often interpreted as learning the data distribution in the low-noise limit ($σ o 0$). In this work, we propose an alternative perspective: their success arises from implicitly learning the data manifold rather than the full distribution. Our claim is based on a novel analysis of scores in the small-$σ$ regime that reveals a sharp separation of scales: information about the data manifold is $Θ(σ^{-2})$ stronger than information about the distribution. We argue that this insight suggests a paradigm shift from the less practical goal of distributional learning to the more attainable task of geometric learning, which provably tolerates $O(σ^{-2})$ larger errors in score approximation. We illustrate this perspective through three consequences: i) in diffusion models, concentration on data support can be achieved with a score error of $o(σ^{-2})$, whereas recovering the specific data distribution requires a much stricter $o(1)$ error; ii) more surprisingly, learning the uniform distribution on the manifold-an especially structured and useful object-is also $O(σ^{-2})$ easier; and iii) in Bayesian inverse problems, the maximum entropy prior is $O(σ^{-2})$ more robust to score errors than generic priors. Finally, we validate our theoretical findings with preliminary experiments on large-scale models, including Stable Diffusion.
Problem

Research questions and friction points this paper is trying to address.

Analyzing score-based methods' geometric learning versus distributional learning
Revealing scale separation between manifold and distribution information
Demonstrating error tolerance advantages in geometric learning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning data manifold geometry instead of full distribution
Revealing scale separation with Θ(σ⁻²) stronger manifold information
Provably tolerating O(σ⁻²) larger errors in score approximation
🔎 Similar Papers
No similar papers found.
X
Xiang Li
Department of Computer Science, ETH Zurich, Switzerland
Z
Zebang Shen
Department of Computer Science, ETH Zurich, Switzerland
Ya-Ping Hsieh
Ya-Ping Hsieh
ETH Zürich
Niao He
Niao He
Associate Professor, ETH Zürich
OptimizationMachine LearningReinforcement Learning