🤖 AI Summary
This work addresses the unclear causes of performance degradation in diffusion models operating in latent spaces, such as those induced by variational autoencoders (VAEs). The authors propose quantifying the “diffusibility” of a latent space via the rate of change of the minimum mean squared error (MMSE) along the diffusion trajectory, decomposing this quantity into contributions from Fisher information (FI) and its rate of change (FIR). They reveal that the encoder’s local geometric properties predominantly govern FIR. For the first time, latent geometric distortion is decoupled into three measurable penalties—dimensionality compression, tangential distortion, and curvature injection—and theoretical conditions for maintaining low FIR are derived. A diagnostic framework grounded in differential and information geometry is thereby established. Experiments across diverse autoencoder architectures validate FI and FIR as effective diagnostic indicators, successfully identifying and mitigating latent diffusion failures.
📝 Abstract
Diffusion models often degrade when trained in latent spaces (e.g., VAEs), yet the formal causes remain poorly understood. We quantify latent-space diffusability through the rate of change of the Minimum Mean Squared Error (MMSE) along the diffusion trajectory. Our framework decomposes this MMSE rate into contributions from Fisher Information (FI) and Fisher Information Rate (FIR). We demonstrate that while global isometry ensures FI alignment, FIR is governed by the encoder's local geometric properties. Our analysis explicitly decouples latent geometric distortion into three measurable penalties: dimensional compression, tangential distortion, and curvature injection. We derive theoretical conditions for FIR preservation across spaces, ensuring maintained diffusability. Experiments across diverse autoencoding architectures validate our framework and establish these efficient FI and FIR metrics as a robust diagnostic suite for identifying and mitigating latent diffusion failure.