🤖 AI Summary
Diffusion models are conventionally interpreted as learning the gradient of the log-data density (i.e., the score function), which requires the learned vector field to be conservative—yet standard training imposes no such constraint. Empirically, trained vector fields strongly violate conservativeness (e.g., path dependence and nonzero curl), yet still yield high-quality samples, posing a theoretical inconsistency. Method: The authors reinterpret diffusion training as matching the velocity field of the Wasserstein gradient flow (WGF), rather than approximating the score function. This perspective arises naturally from optimal transport theory, bypassing reliance on the reverse stochastic differential equation. Contribution/Results: The WGF framework ensures stable density evolution even under non-conservative errors. Numerical experiments confirm significant non-conservativeness in practice, while demonstrating that the learned vector field—interpreted as a WGF velocity field—achieves high-fidelity distributional transport and superior sample quality.
📝 Abstract
Diffusion models are commonly interpreted as learning the score function, i.e., the gradient of the log-density of noisy data. However, this assumption implies that the target of learning is a conservative vector field, which is not enforced by the neural network architectures used in practice. We present numerical evidence that trained diffusion networks violate both integral and differential constraints required of true score functions, demonstrating that the learned vector fields are not conservative. Despite this, the models perform remarkably well as generative mechanisms. To explain this apparent paradox, we advocate a new theoretical perspective: diffusion training is better understood as flow matching to the velocity field of a Wasserstein Gradient Flow (WGF), rather than as score learning for a reverse-time stochastic differential equation. Under this view, the "probability flow" arises naturally from the WGF framework, eliminating the need to invoke reverse-time SDE theory and clarifying why generative sampling remains successful even when the neural vector field is not a true score. We further show that non-conservative errors from neural approximation do not necessarily harm density transport. Our results advocate for adopting the WGF perspective as a principled, elegant, and theoretically grounded framework for understanding diffusion generative models.