🤖 AI Summary
This work proposes LOOM-CFM, a novel conditional flow matching (CFM) approach that addresses the inefficiency of existing methods in large-scale settings by extending minibatch optimal transport to cross-batch optimization. Unlike conventional CFM techniques that optimize data-noise pairings only within individual minibatches, LOOM-CFM continuously refines these pairings across batches, significantly improving the trade-off between sampling speed and generation quality. The method enhances both the sampling efficiency of continuous normalizing flows and the effectiveness of distillation-based initialization, while also enabling high-resolution image synthesis. Extensive experiments demonstrate that LOOM-CFM achieves superior speed-quality balance across multiple benchmark datasets, establishing a new state of the art in scalable generative modeling with flow-based methods.
📝 Abstract
Conditional Flow Matching (CFM), a simulation-free method for training continuous normalizing flows, provides an efficient alternative to diffusion models for key tasks like image and video generation. The performance of CFM in solving these tasks depends on the way data is coupled with noise. A recent approach uses minibatch optimal transport (OT) to reassign noise-data pairs in each training step to streamline sampling trajectories and thus accelerate inference. However, its optimization is restricted to individual minibatches, limiting its effectiveness on large datasets. To address this shortcoming, we introduce LOOM-CFM (Looking Out Of Minibatch-CFM), a novel method to extend the scope of minibatch OT by preserving and optimizing these assignments across minibatches over training time. Our approach demonstrates consistent improvements in the sampling speed-quality trade-off across multiple datasets. LOOM-CFM also enhances distillation initialization and supports high-resolution synthesis in latent space training.