Efficient generative adversarial networks using linear additive-attention Transformers

📅 2024-01-17

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Deep generative models—such as diffusion models and GANs—deliver state-of-the-art performance but suffer from high computational cost and training instability, limiting their deployment in resource-constrained settings and exacerbating carbon emissions. To address this, we propose LadaGAN: an efficient GAN architecture built upon the Linear Additive Attention Transformer (LadaFormer). Its core innovation is a novel single-vector linear additive attention mechanism, reducing per-head attention complexity from *O*(*n*²) to *O*(*n*), and enabling—for the first time—the stable co-design of linear Transformers in both generator and discriminator. Evaluated on multi-resolution benchmarks, LadaGAN outperforms leading CNN- and Transformer-based GANs, achieves 10–100× faster inference, attains image fidelity comparable to diffusion models, and significantly reduces training energy consumption and associated carbon footprint.

Technology Category

Application Category

📝 Abstract

Although the capacity of deep generative models for image generation, such as Diffusion Models (DMs) and Generative Adversarial Networks (GANs), has dramatically improved in recent years, much of their success can be attributed to computationally expensive architectures. This has limited their adoption and use to research laboratories and companies with large resources, while significantly raising the carbon footprint for training, fine-tuning, and inference. In this work, we present a novel GAN architecture which we call LadaGAN. This architecture is based on a linear attention Transformer block named Ladaformer. The main component of this block is a linear additive-attention mechanism that computes a single attention vector per head instead of the quadratic dot-product attention. We employ Ladaformer in both the generator and discriminator, which reduces the computational complexity and overcomes the training instabilities often associated with Transformer GANs. LadaGAN consistently outperforms existing convolutional and Transformer GANs on benchmark datasets at different resolutions while being significantly more efficient. Moreover, LadaGAN shows competitive performance compared to state-of-the-art multi-step generative models (e.g. DMs) using orders of magnitude less computational resources.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost in GANs for image generation

Overcoming training instabilities in Transformer-based GANs

Improving efficiency without sacrificing generative performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear additive-attention mechanism reduces complexity

Ladaformer block enhances GAN stability and efficiency

Outperforms convolutional and Transformer GANs efficiently

🔎 Similar Papers

Regeneration Based Training-free Attribution of Fake Images Generated by Text-to-Image Generative Models