🤖 AI Summary
This work addresses the problem of generating noise-free samples from score-based diffusion models when training data are corrupted by noise—specifically, perturbations of the underlying data manifold. To mitigate dominant noise components orthogonal to the manifold, we introduce the concept of *extended score*, which explicitly decouples gradient responses along intrinsic (tangential) and extrinsic (normal) manifold directions. Based on this geometric decomposition, we design an extended score-guided sampling algorithm that suppresses out-of-manifold noise without requiring additional model training. Theoretical analysis establishes convergence guarantees and denoising efficacy. Experiments on synthetic, toy, and real-world datasets demonstrate that our method significantly improves output signal-to-noise ratio and structural fidelity over standard diffusion sampling, enabling high-fidelity noise-free generation. Our core contribution is a geometry-aware score extension mechanism that achieves implicit, efficient denoising without increasing model capacity or training overhead.
📝 Abstract
Score-based diffusion models are a highly effective method for generating samples from a distribution of images. We consider scenarios where the training data comes from a noisy version of the target distribution, and present an efficiently implementable modification of the inference procedure to generate noiseless samples. Our approach is motivated by the manifold hypothesis, according to which meaningful data is concentrated around some low-dimensional manifold of a high-dimensional ambient space. The central idea is that noise manifests as low magnitude variation in off-manifold directions in contrast to the relevant variation of the desired distribution which is mostly confined to on-manifold directions. We introduce the notion of an extended score and show that, in a simplified setting, it can be used to reduce small variations to zero, while leaving large variations mostly unchanged. We describe how its approximation can be computed efficiently from an approximation to the standard score and demonstrate its efficacy on toy problems, synthetic data, and real data.