🤖 AI Summary
Existing diffusion models fail to exploit the geometric structure of the implicitly learned data manifold, limiting the realism of image interpolation and editing as well as prompt fidelity. This paper introduces the first formulation of the data space of pre-trained diffusion models as a Riemannian manifold induced by the score function, enabling the construction of a differentiable, parameter-free intrinsic metric and a geodesic-based, geometry-aware interpolation method. The approach requires no fine-tuning or additional training and is inherently compatible with diverse diffusion architectures. Experiments on MNIST and Stable Diffusion demonstrate that our method yields interpolations with reduced noise, improved structural coherence, and significantly enhanced text-prompt fidelity—outperforming standard Euclidean interpolation and latent-space mapping strategies. Our core contribution is the establishment of the first Riemannian metric framework grounded in score matching, enabling explicit and computationally efficient exploitation of the geometric structure inherent in diffusion models.
📝 Abstract
Diffusion models excel in content generation by implicitly learning the data manifold, yet they lack a practical method to leverage this manifold - unlike other deep generative models equipped with latent spaces. This paper introduces a novel framework that treats the data space of pre-trained diffusion models as a Riemannian manifold, with a metric derived from the score function. Experiments with MNIST and Stable Diffusion show that this geometry-aware approach yields image interpolations that are more realistic, less noisy, and more faithful to prompts than existing methods, demonstrating its potential for improved content generation and editing.