🤖 AI Summary
Existing mean-field approaches enable single-step generation for accelerated inference but struggle to accurately replicate the multi-step flow matching dynamics in continuous data modeling, leading to degraded generation fidelity and diversity. To address this, we propose OT-MeanFlow, an enhanced mean-field framework grounded in optimal transport (OT) theory. By designing theoretically principled transport paths, OT-MeanFlow ensures that single-step generation faithfully approximates the underlying multi-step flow matching dynamics—without increasing sampling steps. This preserves computational efficiency while substantially improving sample quality. Experiments across image generation, image-to-image translation, and point cloud generation demonstrate that OT-MeanFlow consistently outperforms state-of-the-art mean-field and accelerated sampling baselines under single-step evaluation. These results validate the effectiveness and broad applicability of OT-guided simplification for flow matching.
📝 Abstract
Flow-matching generative models have emerged as a powerful paradigm for continuous data generation, achieving state-of-the-art results across domains such as images, 3D shapes, and point clouds. Despite their success, these models suffer from slow inference due to the requirement of numerous sequential sampling steps. Recent work has sought to accelerate inference by reducing the number of sampling steps. In particular, Mean Flows offer a one-step generation approach that delivers substantial speedups while retaining strong generative performance. Yet, in many continuous domains, Mean Flows fail to faithfully approximate the behavior of the original multi-step flow-matching process. In this work, we address this limitation by incorporating optimal transport-based sampling strategies into the Mean Flow framework, enabling one-step generators that better preserve the fidelity and diversity of the original multi-step flow process. Experiments on controlled low-dimensional settings and on high-dimensional tasks such as image generation, image-to-image translation, and point cloud generation demonstrate that our approach achieves superior inference accuracy in one-step generative modeling.