🤖 AI Summary
Existing theoretical analyses of diffusion generative models are fragmented and suffer from inconsistent notation, hindering unified understanding and principled development.
Method: This work establishes a rigorous, unified mathematical framework grounded in fundamental properties of Gaussian distributions. It systematically derives the closed-form marginal distribution of the forward noising process, the analytical form of the reverse posterior, and the variational lower bound, ultimately yielding an optimization objective equivalent to noise prediction.
Contribution/Results: The framework reveals the intrinsic equivalence between DDIM and rectified flow; provides a unified probabilistic interpretation of classifier-guided and classifier-free guidance; and integrates SDE/ODE formulations, the Fokker–Planck equation, flow matching, and multi-scale modeling—ensuring both theoretical coherence and practical implementability. Validated on mainstream models including Stable Diffusion, the framework enables efficient sampling and precise modeling while unifying disparate theoretical perspectives.
📝 Abstract
We present a concise, self-contained derivation of diffusion-based generative models. Starting from basic properties of Gaussian distributions (densities, quadratic expectations, re-parameterisation, products, and KL divergences), we construct denoising diffusion probabilistic models from first principles. This includes the forward noising process, its closed-form marginals, the exact discrete reverse posterior, and the related variational bound. This bound simplifies to the standard noise-prediction goal used in practice. We then discuss likelihood estimation and accelerated sampling, covering DDIM, adversarially learned reverse dynamics (DDGAN), and multi-scale variants such as nested and latent diffusion, with Stable Diffusion as a canonical example. A continuous-time formulation follows, in which we derive the probability-flow ODE from the diffusion SDE via the continuity and Fokker-Planck equations, introduce flow matching, and show how rectified flows recover DDIM up to a time re-parameterisation. Finally, we treat guided diffusion, interpreting classifier guidance as a posterior score correction and classifier-free guidance as a principled interpolation between conditional and unconditional scores. Throughout, the focus is on transparent algebra, explicit intermediate steps, and consistent notation, so that readers can both follow the theory and implement the corresponding algorithms in practice.