Understanding Flatness in Generative Models: Its Role and Benefits

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the critical role of loss landscape flatness in diffusion model performance and robustness. Addressing the vulnerability of generative models to prior perturbations and quantization-induced degradation, we theoretically establish and empirically validate that flat minima mitigate exposure bias and significantly enhance robustness against both prior perturbations and model quantization. We are the first to explicitly incorporate Sharpness-Aware Minimization (SAM) into diffusion model training—outperforming implicit flattening methods such as Stochastic Weight Averaging (SWA) and Exponential Moving Average (EMA). Extensive experiments on CIFAR-10, LSUN Tower, and FFHQ demonstrate that explicit flattening reduces Fréchet Inception Distance (FID) by an average of 12% and substantially improves Learned Perceptual Image Patch Similarity (LPIPS). Moreover, under 8-bit weight quantization, generated image quality remains nearly intact, with FID degradation approaching zero.

Technology Category

Application Category

📝 Abstract
Flat minima, known to enhance generalization and robustness in supervised learning, remain largely unexplored in generative models. In this work, we systematically investigate the role of loss surface flatness in generative models, both theoretically and empirically, with a particular focus on diffusion models. We establish a theoretical claim that flatter minima improve robustness against perturbations in target prior distributions, leading to benefits such as reduced exposure bias -- where errors in noise estimation accumulate over iterations -- and significantly improved resilience to model quantization, preserving generative performance even under strong quantization constraints. We further observe that Sharpness-Aware Minimization (SAM), which explicitly controls the degree of flatness, effectively enhances flatness in diffusion models, whereas other well-known methods such as Stochastic Weight Averaging (SWA) and Exponential Moving Average (EMA), which promote flatness indirectly via ensembling, are less effective. Through extensive experiments on CIFAR-10, LSUN Tower, and FFHQ, we demonstrate that flat minima in diffusion models indeed improves not only generative performance but also robustness.
Problem

Research questions and friction points this paper is trying to address.

Explores flat minima's role in generative models' robustness.
Investigates flat minima's impact on reducing exposure bias.
Assesses flatness enhancement methods in diffusion models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explores flat minima in generative models
Uses Sharpness-Aware Minimization for flatness
Improves robustness and generative performance
🔎 Similar Papers
No similar papers found.