π€ AI Summary
This work addresses the limitations of generative adversarial networks (GANs) in producing images with sufficient diversity and quality across varying data scales. To this end, the authors propose FakeTwins, a self-supervised loss mechanism, combined with a cross-architecture discriminator consistency strategy. By leveraging a pretrained network as a source of self-supervised signals, the method jointly trains multi-scale feature maps extracted from both CNN and Vision Transformer backbones, effectively integrating their complementary priors to enhance training stability and generalization. Evaluated on 17 diverse datasets spanning different image domains and data scales, the proposed approach consistently outperforms current state-of-the-art methods, achieving substantial improvements in FrΓ©chet Inception Distance (FID) and generating images with markedly enhanced quality and diversity.