Understanding, Accelerating, and Improving MeanFlow Training

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

The interplay between instantaneous and mean velocity fields in MeanFlow remains poorly understood, hindering both few-step generation quality and training efficiency. Method: We establish that learning the instantaneous velocity field is a prerequisite for mean velocity field estimation, and propose a time-interval-based curriculum learning strategy: prioritizing short-horizon instantaneous fields early to accelerate convergence, then progressively shifting focus to long-horizon mean fields for refinement—enabling dynamic decoupling and coordination. Our approach integrates the DiT architecture, coupled velocity field modeling, task affinity analysis, and phased optimization. Results: On ImageNet 256×256, our method achieves a FID of 2.87 with 1-NFE sampling—significantly improving upon the baseline (3.43). It accelerates training by 2.5× at equivalent performance or enables use of a smaller DiT-L model. This work provides the first systematic analysis and improvement of MeanFlow’s training dynamics.

Technology Category

Application Category

📝 Abstract

MeanFlow promises high-quality generative modeling in few steps, by jointly learning instantaneous and average velocity fields. Yet, the underlying training dynamics remain unclear. We analyze the interaction between the two velocities and find: (i) well-established instantaneous velocity is a prerequisite for learning average velocity; (ii) learning of instantaneous velocity benefits from average velocity when the temporal gap is small, but degrades as the gap increases; and (iii) task-affinity analysis indicates that smooth learning of large-gap average velocities, essential for one-step generation, depends on the prior formation of accurate instantaneous and small-gap average velocities. Guided by these observations, we design an effective training scheme that accelerates the formation of instantaneous velocity, then shifts emphasis from short- to long-interval average velocity. Our enhanced MeanFlow training yields faster convergence and significantly better few-step generation: With the same DiT-XL backbone, our method reaches an impressive FID of 2.87 on 1-NFE ImageNet 256x256, compared to 3.43 for the conventional MeanFlow baseline. Alternatively, our method matches the performance of the MeanFlow baseline with 2.5x shorter training time, or with a smaller DiT-L backbone.

Problem

Research questions and friction points this paper is trying to address.

Analyzing training dynamics between instantaneous and average velocity fields in MeanFlow

Addressing performance degradation when temporal gap increases during velocity learning

Developing enhanced training scheme for faster convergence and better few-step generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint learning of instantaneous and average velocity fields

Staged training shifting from short to long intervals

Enhanced MeanFlow achieving faster convergence and performance

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training