Deeply Supervised Flow-Based Generative Models

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing streaming generative models supervise only the final-layer velocity, leading to underutilization of intermediate representations and slow training convergence. To address this, we propose DeepFlow—a novel framework featuring three key innovations: (1) a branched Transformer architecture enabling hierarchical feature extraction; (2) the first layer-wise velocity alignment mechanism, enabling multi-level velocity supervision; and (3) a lightweight Velocity Refiner with Acceleration (VeRA) module that achieves efficient classifier-free guidance without auxiliary classifiers. Experiments demonstrate that DeepFlow accelerates convergence by 8× and reduces FID by 2.6 on ImageNet, while halving training time. Moreover, it significantly outperforms baselines on text-to-image generation tasks across MSCOCO and GenEval benchmarks. These results validate that deep velocity supervision jointly enhances both generation quality and training efficiency.

Technology Category

Application Category

📝 Abstract

Flow based generative models have charted an impressive path across multiple visual generation tasks by adhering to a simple principle: learning velocity representations of a linear interpolant. However, we observe that training velocity solely from the final layer output underutilizes the rich inter layer representations, potentially impeding model convergence. To address this limitation, we introduce DeepFlow, a novel framework that enhances velocity representation through inter layer communication. DeepFlow partitions transformer layers into balanced branches with deep supervision and inserts a lightweight Velocity Refiner with Acceleration (VeRA) block between adjacent branches, which aligns the intermediate velocity features within transformer blocks. Powered by the improved deep supervision via the internal velocity alignment, DeepFlow converges 8 times faster on ImageNet with equivalent performance and further reduces FID by 2.6 while halving training time compared to previous flow based models without a classifier free guidance. DeepFlow also outperforms baselines in text to image generation tasks, as evidenced by evaluations on MSCOCO and zero shot GenEval.

Problem

Research questions and friction points this paper is trying to address.

Enhances velocity representation in flow-based generative models.

Improves model convergence and reduces training time significantly.

Outperforms baselines in text-to-image generation tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

DeepFlow enhances velocity representation via inter-layer communication.

Introduces Velocity Refiner with Acceleration (VeRA) block.

DeepFlow converges faster with improved deep supervision.

🔎 Similar Papers

Diffusion Models: A Comprehensive Survey of Methods and Applications