🤖 AI Summary
To address the high computational cost and low sampling efficiency of diffusion models and flow matching methods, this paper proposes Modularized Average Flow (MAF), a one-step generative modeling framework. MAF learns a time-averaged velocity field, enabling high-fidelity sample generation via a single function evaluation. Its key contributions are: (1) establishing a differential identity linking instantaneous and average velocities, yielding a differentiable loss family that avoids higher-order derivatives; (2) introducing a gradient modulation mechanism and curriculum-based warm-up strategy to substantially improve training stability; and (3) unifying and generalizing both consistency modeling and flow matching paradigms. Experiments demonstrate that MAF achieves state-of-the-art sample quality on image generation and trajectory modeling tasks, while exhibiting strong convergence properties, robustness under limited data, and superior out-of-distribution generalization.
📝 Abstract
One-step generative modeling seeks to generate high-quality data samples in a single function evaluation, significantly improving efficiency over traditional diffusion or flow-based models. In this work, we introduce Modular MeanFlow (MMF), a flexible and theoretically grounded approach for learning time-averaged velocity fields. Our method derives a family of loss functions based on a differential identity linking instantaneous and average velocities, and incorporates a gradient modulation mechanism that enables stable training without sacrificing expressiveness. We further propose a curriculum-style warmup schedule to smoothly transition from coarse supervision to fully differentiable training. The MMF formulation unifies and generalizes existing consistency-based and flow-matching methods, while avoiding expensive higher-order derivatives. Empirical results across image synthesis and trajectory modeling tasks demonstrate that MMF achieves competitive sample quality, robust convergence, and strong generalization, particularly under low-data or out-of-distribution settings.