🤖 AI Summary
Existing flow matching and diffusion models suffer from discretization errors under few-step sampling and exhibit color oversaturation without classifier-free guidance (CFG). To address these issues, we propose Gaussian Mixture Flow Matching (GMFlow): a novel framework that models the flow velocity as a learnable, dynamically evolving Gaussian mixture distribution—the first such formulation in flow-based generative modeling. We design analytical GM-SDE and GM-ODE solvers enabling exact sampling in minimal steps (e.g., 6 steps). Furthermore, we introduce a probabilistic guidance mechanism—replacing CFG—that effectively mitigates oversaturation while preserving sample fidelity. Our approach unifies flow matching and diffusion modeling under a KL-divergence objective for optimizing mixture parameters. On ImageNet 256×256, GMFlow achieves a precision of 0.942 using only six sampling steps, substantially outperforming prior flow matching methods.
📝 Abstract
Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models reparameterize the Gaussian mean as flow velocity. However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a multi-modal flow velocity distribution, which can be learned with a KL divergence loss. We demonstrate that GMFlow generalizes previous diffusion and flow matching models where a single Gaussian is learned with an $L_2$ denoising loss. For inference, we derive GM-SDE/ODE solvers that leverage analytic denoising distributions and velocity fields for precise few-step sampling. Furthermore, we introduce a novel probabilistic guidance scheme that mitigates the over-saturation issues of CFG and improves image generation quality. Extensive experiments demonstrate that GMFlow consistently outperforms flow matching baselines in generation quality, achieving a Precision of 0.942 with only 6 sampling steps on ImageNet 256$ imes$256.