🤖 AI Summary
This work investigates the fundamental speed–accuracy trade-off in diffusion models, establishing—for the first time—a theoretical connection to nonequilibrium stochastic thermodynamics. Methodologically, it links the entropy production rate (a kinetic speed measure in the absence of nonconservative forces) to generation accuracy, deriving a quantitative inequality between them; it further defines an “optimal learning protocol” via the 2-Wasserstein geodesic from optimal transport theory, revealing an intrinsic Pareto frontier between speed and fidelity. The approach integrates stochastic thermodynamics, Fokker–Planck analysis, Wasserstein geometry, and numerical diffusion modeling. Experiments across diverse noise schedules and real-world image datasets validate the trade-off, quantify distortion induced by nonconservative forces, and demonstrate that the optimal protocol significantly improves sampling efficiency and reconstruction quality.
📝 Abstract
We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-accuracy relations for the diffusion models, which are inequalities that relate the accuracy of data generation to the entropy production rate, which can be interpreted as the speed of the diffusion dynamics in the absence of the non-conservative force. From a stochastic thermodynamic perspective, our results provide a quantitative insight into how best to generate data in diffusion models. The optimal learning protocol is introduced by the geodesic of space of the 2-Wasserstein distance in optimal transport theory. We numerically illustrate the validity of the speed-accuracy relations for the diffusion models with different noise schedules and the different data. We numerically discuss our results for the optimal and suboptimal learning protocols. We also show the inaccurate data generation due to the non-conservative force, and the applicability of our results to data generation from the real-world image datasets.