🤖 AI Summary
Diffusion and flow-matching models achieve high-quality generation but suffer from slow inference; distillation to few-step sampling often leads to instability and requires extensive hyperparameter tuning. To address this, we propose Inductive Moment Matching (IMM), a novel generative paradigm that enables one- or few-step sampling without pretraining and with single-stage end-to-end training. IMM introduces the first moment-matching mechanism with distribution-level convergence guarantees—eliminating the need for teacher models or multi-stage optimization. It relies solely on a principled, moment-based objective function and standard neural architectures. On ImageNet-256, IMM achieves a FID of 1.99 using only eight sampling steps; on CIFAR-10, it attains a FID of 1.98 in just two steps—setting a new state-of-the-art for zero-initialized few-step generative modeling.
📝 Abstract
Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning. To resolve these trade-offs, we propose Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure. Unlike distillation, IMM does not require pre-training initialization and optimization of two networks; and unlike Consistency Models, IMM guarantees distribution-level convergence and remains stable under various hyperparameters and standard model architectures. IMM surpasses diffusion models on ImageNet-256x256 with 1.99 FID using only 8 inference steps and achieves state-of-the-art 2-step FID of 1.98 on CIFAR-10 for a model trained from scratch.