🤖 AI Summary
Generative models such as diffusion models rely on multi-step numerical integration for inference, incurring high computational cost; while consistency models enable one-step generation, they lack a unified theoretical foundation. Method: We propose a novel paradigm that directly learns the flow map between two time points of an ordinary differential equation (ODE), unifying consistency modeling, progressive distillation, and other few-step generation approaches. Leveraging stochastic interpolation, we jointly optimize flow map prediction and velocity field distillation loss, integrating ODE theory with neural operator principles to enable tunable step counts for precision–efficiency trade-offs. Contribution/Results: On CIFAR-10 and ImageNet 32×32, our method achieves 10–50× speedup in sampling over standard diffusion models while preserving competitive sample quality—demonstrating both theoretical coherence and practical efficacy in accelerating generative inference.
📝 Abstract
Generative models based on dynamical transport of measure, such as diffusion models, flow matching models, and stochastic interpolants, learn an ordinary or stochastic differential equation whose trajectories push initial conditions from a known base distribution onto the target. While training is cheap, samples are generated via simulation, which is more expensive than one-step models like GANs. To close this gap, we introduce flow map matching -- an algorithm that learns the two-time flow map of an underlying ordinary differential equation. The approach leads to an efficient few-step generative model whose step count can be chosen a-posteriori to smoothly trade off accuracy for computational expense. Leveraging the stochastic interpolant framework, we introduce losses for both direct training of flow maps and distillation from pre-trained (or otherwise known) velocity fields. Theoretically, we show that our approach unifies many existing few-step generative models, including consistency models, consistency trajectory models, progressive distillation, and neural operator approaches, which can be obtained as particular cases of our formalism. With experiments on CIFAR-10 and ImageNet 32x32, we show that flow map matching leads to high-quality samples with significantly reduced sampling cost compared to diffusion or stochastic interpolant methods.