Cross-scale Aligned Supervision for Training GANs

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the issue of inconsistent cross-scale generation trajectories in multi-scale generative adversarial networks (GANs), where intermediate-resolution outputs, though photorealistic, often fail to align with the evolutionary path of the same sample. To resolve this, the authors propose the Cross-scale Alignment Transformer (CAT), which introduces a lightweight consistency regularizer within the generator—without altering the discriminator architecture—to align intermediate features with the final output, thereby establishing a genuine coarse-to-fine generation hierarchy. This approach is the first to explicitly tackle cross-scale trajectory misalignment in multi-stage GANs. Evaluated on ImageNet-256, the model achieves an FID-50K of 1.56 after only 60 training epochs and single-step inference, outperforming current state-of-the-art single-step GANs as well as diffusion and flow-based models.

📝 Abstract

Modern GANs often introduce adversarial supervision on intermediate generator outputs and interpret the resulting multi-stage synthesis as coarse-to-fine hierarchical generation. In this work, we challenge this interpretation. We argue that standard scale-wise adversarial supervision does not construct a proper coarse-to-fine hierarchy: each intermediate image is independently pushed toward the real distribution at its own resolution, but this scale-wise realism does not ensure that outputs across stages represent the identical generated sample. Moreover, the scale-specific image produced at each stage is not used as an explicit refinement target for the subsequent stage. Therefore, its adversarial loss can improve a scale-specific output without constraining later stages to preserve the same sample trajectory, allowing them to move toward a different sample rather than refine the previous output. We refer to this problem as a cross-scale trajectory misalignment problem. To resolve it, we propose CAT, a Cross-scale Aligned Transformer for multi-scale adversarial generation. CAT keeps the discriminator scale-wise, so each intermediate output is evaluated at its own resolution, while adding a simple generator-side consistency regularization that aligns intermediate outputs with the final output. On class-conditional ImageNet-256, CAT-H/2 achieves an FID-50K of 1.56 with one-step inference after only 60 training epochs, outperforming strong one-step GAN and diffusion/flow baselines.

Problem

Research questions and friction points this paper is trying to address.

cross-scale alignment

GANs

trajectory misalignment

multi-scale generation

adversarial supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-scale alignment

coarse-to-fine generation

adversarial supervision