🤖 AI Summary
GAN training frequently suffers from instability and mode collapse. To address this, we propose the Unconditional Discriminator (UCD) mechanism: by removing conditional inputs from the discriminator, UCD compels it to learn more robust, global representations of the data distribution, thereby strengthening its supervisory signal to the generator and promoting convergence toward a Nash equilibrium at the training dynamics level. Grounded in standard GAN theory, UCD is architecture-agnostic and plug-and-play compatible with existing frameworks. Theoretical analysis and adversarial training experiments confirm improved training stability. Empirically, UCD achieves a state-of-the-art FID of 1.47 on ImageNet-64—surpassing StyleGAN-XL and multiple advanced diffusion models—demonstrating exceptional balance between generation quality and computational efficiency.
📝 Abstract
Adversarial training turns out to be the key to one-step generation, especially for Generative Adversarial Network (GAN) and diffusion model distillation. Yet in practice, GAN training hardly converges properly and struggles in mode collapse. In this work, we quantitatively analyze the extent of Nash equilibrium in GAN training, and conclude that redundant shortcuts by inputting condition in $D$ disables meaningful knowledge extraction. We thereby propose to employ an unconditional discriminator (UCD), in which $D$ is enforced to extract more comprehensive and robust features with no condition injection. In this way, $D$ is able to leverage better knowledge to supervise $G$, which promotes Nash equilibrium in GAN literature. Theoretical guarantee on compatibility with vanilla GAN theory indicates that UCD can be implemented in a plug-in manner. Extensive experiments confirm the significant performance improvements with high efficiency. For instance, we achieved extbf{1.47 FID} on the ImageNet-64 dataset, surpassing StyleGAN-XL and several state-of-the-art one-step diffusion models. The code will be made publicly available.