🤖 AI Summary
This work addresses geometric modeling and generation of video frame sequences in the latent space of diffusion models. We propose a training-free image sequence interpolation and extrapolation method grounded in Riemannian geometry. Our core innovation is the first formulation of the implicit probability density of a pre-trained diffusion model as a Riemannian metric tensor, enabling the construction of density-weighted geodesics that naturally evolve along high-probability manifolds. By numerically solving geodesic initial- and boundary-value problems under a variable-coefficient metric, we achieve optimal path planning in latent space. The method quantifies how closely a given video frame sequence approximates a geodesic and generates high-fidelity interpolated and extrapolated sequences without fine-tuning. It establishes a new paradigm for geometric interpretation of diffusion latent spaces and controllable temporal generation.
📝 Abstract
Diffusion models indirectly estimate the probability density over a data space, which can be used to study its structure. In this work, we show that geodesics can be computed in diffusion latent space, where the norm induced by the spatially-varying inner product is inversely proportional to the probability density. In this formulation, a path that traverses a high density (that is, probable) region of image latent space is shorter than the equivalent path through a low density region. We present algorithms for solving the associated initial and boundary value problems and show how to compute the probability density along the path and the geodesic distance between two points. Using these techniques, we analyze how closely video clips approximate geodesics in a pre-trained image diffusion space. Finally, we demonstrate how these techniques can be applied to training-free image sequence interpolation and extrapolation, given a pre-trained image diffusion model.