🤖 AI Summary
One-step generative models like MeanFlow (MF) face two key challenges: (1) coupled training objectives that destabilize optimization by entangling network parameters with the loss, and (2) fixed, inflexible classifier-guidance scales that lack post-training adjustability. This work proposes iMF—an improved framework addressing both issues. First, iMF decouples instantaneous velocity learning by directly regressing the mean velocity field, enhancing training stability. Second, it introduces the first post-hoc tunable, classifier-free guidance mechanism for fast-forward generative models, enabling diverse conditional synthesis without requiring a pretrained classifier. iMF achieves this via velocity-field reparameterization and an in-context conditioning mechanism, supporting end-to-end training without knowledge distillation. On ImageNet 256×256, iMF achieves a state-of-the-art single-step sampling FID of 1.72—significantly outperforming prior one-step methods and approaching the performance of multi-step models—thereby establishing one-step generation as a viable, standalone paradigm.
📝 Abstract
MeanFlow (MF) has recently been established as a framework for one-step generative modeling. However, its ``fastforward'' nature introduces key challenges in both the training objective and the guidance mechanism. First, the original MF's training target depends not only on the underlying ground-truth fields but also on the network itself. To address this issue, we recast the objective as a loss on the instantaneous velocity $v$, re-parameterized by a network that predicts the average velocity $u$. Our reformulation yields a more standard regression problem and improves the training stability. Second, the original MF fixes the classifier-free guidance scale during training, which sacrifices flexibility. We tackle this issue by formulating guidance as explicit conditioning variables, thereby retaining flexibility at test time. The diverse conditions are processed through in-context conditioning, which reduces model size and benefits performance. Overall, our $ extbf{improved MeanFlow}$ ($ extbf{iMF}$) method, trained entirely from scratch, achieves $ extbf{1.72}$ FID with a single function evaluation (1-NFE) on ImageNet 256$ imes$256. iMF substantially outperforms prior methods of this kind and closes the gap with multi-step methods while using no distillation. We hope our work will further advance fastforward generative modeling as a stand-alone paradigm.