🤖 AI Summary
This work uncovers a universal low-dimensional geometric regularity in deterministic sampling trajectories of diffusion models: all trajectories lie exactly within an extremely low-dimensional linear subspace and consistently exhibit a “boomerang”-shaped structure—irrespective of model architecture, conditioning inputs, or generated content. To characterize and exploit this phenomenon, we first formally model the geometric structure of sampling trajectories and propose a trajectory analysis framework grounded in probability flow ODEs and kernel density estimation. Building upon this, we design a dynamic programming–driven schedule alignment strategy that jointly improves sampling efficiency and quality using only 5–10 function evaluations. Our method is lightweight, requires no retraining, and is fully compatible with mainstream ODE solvers (e.g., DOPRI5), incurring negligible computational overhead.
📝 Abstract
Diffusion-based generative models employ stochastic differential equations (SDEs) and their equivalent probability flow ordinary differential equations (ODEs) to establish a smooth transformation between complex high-dimensional data distributions and tractable prior distributions. In this paper, we reveal a striking geometric regularity in the deterministic sampling dynamics: each simulated sampling trajectory lies within an extremely low-dimensional subspace, and all trajectories exhibit an almost identical ''boomerang'' shape, regardless of the model architecture, applied conditions, or generated content. We characterize several intriguing properties of these trajectories, particularly under closed-form solutions based on kernel-estimated data modeling. We also demonstrate a practical application of the discovered trajectory regularity by proposing a dynamic programming-based scheme to better align the sampling time schedule with the underlying trajectory structure. This simple strategy requires minimal modification to existing ODE-based numerical solvers, incurs negligible computational overhead, and achieves superior image generation performance, especially in regions with only $5 sim 10$ function evaluations.