🤖 AI Summary
This work addresses the theoretical complexity and inconsistent interpretations of diffusion models by proposing a concise, self-contained unifying framework grounded in signal processing. Methodologically, it abandons conventional Markov chain and reverse stochastic differential equation (SDE) formulations, instead modeling generation as a stochastic walk coupled with the Tweedie formula—thereby decoupling score estimation, noise scheduling, and sampling, and enabling likelihood-free conditional generation. Key contributions include: (1) the first self-contained theoretical interpretation independent of reverse SDEs or probability flows; (2) full decoupling of noise scheduling between training and sampling, enhancing flexibility in conditional synthesis; and (3) faithful reproduction and unification of major models—including DDPM, DDIM, and Score SDE—while maintaining state-of-the-art performance on image generation and inverse problems, significantly improving both theoretical parsimony and practical interpretability.
📝 Abstract
We present a simple template for designing generative diffusion model algorithms based on an interpretation of diffusion sampling as a sequence of random walks. Score-based diffusion models are widely used to generate high-quality images. Diffusion models have also been shown to yield state-of-the-art performance in many inverse problems. While these algorithms are often surprisingly simple, the theory behind them is not, and multiple complex theoretical justifications exist in the literature. Here, we provide a simple and largely self-contained theoretical justification for score-based-diffusion models that avoids using the theory of Markov chains or reverse diffusion, instead centering the theory of random walks and Tweedie's formula. This approach leads to unified algorithmic templates for network training and sampling. In particular, these templates cleanly separate training from sampling, e.g., the noise schedule used during training need not match the one used during sampling. We show that several existing diffusion models correspond to particular choices within this template and demonstrate that other, more straightforward algorithmic choices lead to effective diffusion models. The proposed framework has the added benefit of enabling conditional sampling without any likelihood approximation.