🤖 AI Summary
This work investigates how the local geometric structure of the data manifold learned by generative models influences generation quality—specifically aesthetic score, diversity, and memorization. We propose three geometric descriptors—scale, rank, and nonsmoothness—to systematically characterize the local manifold geometry for the first time. Building upon these, we design a self-supervised “geometric reward” mechanism that requires no human annotations, enabling geometry-guided generation. Our method integrates differential geometric analysis with diffusion process modeling, grounded in the theory of continuous piecewise-linear (CPWL) generators. We validate its generality across DDPM, DiT, and Stable Diffusion 1.4. Experiments demonstrate that the geometric descriptors strongly predict generation performance; training the reward model using only local scale improves aesthetic score by 12.7% and diversity by 18.3% on Stable Diffusion. This work establishes a new paradigm for interpretable and controllable generative modeling.
📝 Abstract
Deep Generative Models are frequently used to learn continuous representations of complex data distributions using a finite number of samples. For any generative model, including pre-trained foundation models with Diffusion or Transformer architectures, generation performance can significantly vary across the learned data manifold. In this paper we study the local geometry of the learned manifold and its relationship to generation outcomes for a wide range of generative models, including DDPM, Diffusion Transformer (DiT), and Stable Diffusion 1.4. Building on the theory of continuous piecewise-linear (CPWL) generators, we characterize the local geometry in terms of three geometric descriptors - scaling ($psi$), rank ($
u$), and complexity/un-smoothness ($delta$). We provide quantitative and qualitative evidence showing that for a given latent-image pair, the local descriptors are indicative of generation aesthetics, diversity, and memorization by the generative model. Finally, we demonstrate that by training a reward model on the local scaling for Stable Diffusion, we can self-improve both generation aesthetics and diversity using `geometry reward' based guidance during denoising.