🤖 AI Summary
This work addresses the limitations of existing flow matching methods, which rely on fixed mean squared error losses and struggle to accurately align with target data distributions. The authors propose Continuous Adversarial Flow (CAF), the first framework to integrate adversarial training into continuous-time flow matching by replacing conventional loss functions with a learnable discriminator that dynamically guides the generation process. CAF supports both end-to-end training and serves as a general-purpose post-optimization strategy to enhance pre-trained flow matching models. Evaluated on ImageNet at 256px resolution, CAF achieves state-of-the-art unconditional FID scores of 3.63 (with SiT) and 3.57 (with JiT), while also improving performance in guided image generation and text-to-image tasks, demonstrating its effectiveness in enhancing sample quality and distribution alignment.
📝 Abstract
We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a different generalized distribution, which empirically produces samples that are better aligned with the target data distribution. Our method is primarily proposed for post-training existing flow-matching models, although it can also train models from scratch. On the ImageNet 256px generation task, our post-training substantially improves the guidance-free FID of latent-space SiT from 8.26 to 3.63 and of pixel-space JiT from 7.17 to 3.57. It also improves guided generation, reducing FID from 2.06 to 1.53 for SiT and from 1.86 to 1.80 for JiT. We further evaluate our approach on text-to-image generation, where it achieves improved results on both the GenEval and DPG benchmarks.