Boosting Latent Diffusion with Perceptual Objectives

📅 2024-11-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Latent diffusion models (LDMs) suffer from impoverished image details, reduced sharpness, and diminished photorealism due to the decoupling of the diffusion process from decoder-based reconstruction. To address this, we propose Latent Perceptual Loss (LPL), the first end-to-end differentiable supervision mechanism that explicitly incorporates intermediate decoder features into latent-space training—thereby bridging diffusion modeling and high-fidelity reconstruction. LPL is architecture- and training-agnostic, seamlessly integrating with diverse generative paradigms including DDPM and flow matching without modifying network structure or optimization procedures. Evaluated at 256×256 and 512×512 resolutions, LPL consistently improves FID by 6–20%, while markedly enhancing image sharpness and visual realism. Our approach establishes a general, plug-and-play supervisory paradigm for optimizing generative quality in latent space.

Technology Category

Application Category

📝 Abstract

Latent diffusion models (LDMs) power state-of-the-art high-resolution generative image models. LDMs learn the data distribution in the latent space of an autoencoder (AE) and produce images by mapping the generated latents into RGB image space using the AE decoder. While this approach allows for efficient model training and sampling, it induces a disconnect between the training of the diffusion model and the decoder, resulting in a loss of detail in the generated images. To remediate this disconnect, we propose to leverage the internal features of the decoder to define a latent perceptual loss (LPL). This loss encourages the models to create sharper and more realistic images. Our loss can be seamlessly integrated with common autoencoders used in latent diffusion models, and can be applied to different generative modeling paradigms such as DDPM with epsilon and velocity prediction, as well as flow matching. Extensive experiments with models trained on three datasets at 256 and 512 resolution show improved quantitative -- with boosts between 6% and 20% in FID -- and qualitative results when using our perceptual loss.

Problem

Research questions and friction points this paper is trying to address.

Latent Diffusion Models

Image Detail

Realism

Innovation

Methods, ideas, or system contributions that make the work stand out.

Loss Function (LPL)

Latent Diffusion Models (LDMs)

Image Generation Quality

🔎 Similar Papers

No similar papers found.