Align Your Flow: Scaling Continuous-Time Flow Map Distillation

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Diffusion and flow-based models achieve high generation quality but suffer from slow sampling due to requiring many steps; consistency models enable single-step sampling via distillation but compromise fidelity. This paper proposes a continuous-time flow mapping distillation framework—the first to formulate two complementary continuous-time flow matching objectives, seamlessly integrating consistency modeling with flow matching. We introduce a self-guided distillation mechanism and lightweight adversarial fine-tuning to significantly enhance fidelity while preserving sample diversity. Our method enables exact single-step mapping between arbitrary noise levels and supports both image and text-to-image generation. On ImageNet 64×64 and 512×512, it achieves state-of-the-art few-step generation performance within ≤4 sampling steps. In text-to-image synthesis, it outperforms all non-adversarially trained few-step samplers.

Technology Category

Application Category

📝 Abstract

Diffusion- and flow-based models have emerged as state-of-the-art generative modeling approaches, but they require many sampling steps. Consistency models can distill these models into efficient one-step generators; however, unlike flow- and diffusion-based methods, their performance inevitably degrades when increasing the number of steps, which we show both analytically and empirically. Flow maps generalize these approaches by connecting any two noise levels in a single step and remain effective across all step counts. In this paper, we introduce two new continuous-time objectives for training flow maps, along with additional novel training techniques, generalizing existing consistency and flow matching objectives. We further demonstrate that autoguidance can improve performance, using a low-quality model for guidance during distillation, and an additional boost can be achieved by adversarial finetuning, with minimal loss in sample diversity. We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks and achieve state-of-the-art few-step generation performance on both ImageNet 64x64 and 512x512, using small and efficient neural networks. Finally, we show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.

Problem

Research questions and friction points this paper is trying to address.

Improving efficiency of generative models with fewer steps

Enhancing performance consistency across varying step counts

Advancing text-to-image synthesis with few-step samplers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous-time objectives for training flow maps

Autoguidance improves distillation performance

Adversarial finetuning enhances sample quality

🔎 Similar Papers

Attention-guided Feature Distillation for Semantic Segmentation