The GAN is dead; long live the GAN! A Modern GAN Baseline

📅 2025-01-09

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

GAN training frequently suffers from non-convergence and mode collapse, and heavily relies on ad-hoc heuristics, lacking a stable, efficient, and theoretically grounded modern benchmark. To address this, we propose the Regularized Relative Discriminative Loss (RRDL), the first GAN loss with provable local convergence guarantees. Leveraging RRDL, we introduce R3GAN—a minimalist, theory-driven GAN benchmark built upon StyleGAN2 but with architecture decoupling and simplification. R3GAN eliminates heuristic components such as weight clipping and path-length regularization, retaining only the core discriminative mechanism and theoretically justified regularization. Evaluated on FFHQ, ImageNet, CIFAR-10, and Stacked MNIST, R3GAN consistently outperforms StyleGAN2 and surpasses leading GANs—and even certain diffusion models—on standard metrics including FID and LPIPS. R3GAN thus establishes the first minimalist GAN benchmark that simultaneously offers rigorous theoretical guarantees and state-of-the-art empirical performance.

Technology Category

Application Category

📝 Abstract

There is a widely-spread claim that GANs are difficult to train, and GAN architectures in the literature are littered with empirical tricks. We provide evidence against this claim and build a modern GAN baseline in a more principled manner. First, we derive a well-behaved regularized relativistic GAN loss that addresses issues of mode dropping and non-convergence that were previously tackled via a bag of ad-hoc tricks. We analyze our loss mathematically and prove that it admits local convergence guarantees, unlike most existing relativistic losses. Second, our new loss allows us to discard all ad-hoc tricks and replace outdated backbones used in common GANs with modern architectures. Using StyleGAN2 as an example, we present a roadmap of simplification and modernization that results in a new minimalist baseline -- R3GAN. Despite being simple, our approach surpasses StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST datasets, and compares favorably against state-of-the-art GANs and diffusion models.

Problem

Research questions and friction points this paper is trying to address.

Generative Adversarial Networks

Training Difficulty

Performance Instability

Innovation

Methods, ideas, or system contributions that make the work stand out.

R3GAN

GAN loss function

Stability and Performance Improvement

🔎 Similar Papers

LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space