🤖 AI Summary
This work addresses the inefficiency of existing diffusion and flow-based generative models, which rely on multi-step iterative sampling. The authors propose a single-step generation framework grounded in Wasserstein gradient flows, introducing for the first time an optimal transport path driven by Sinkhorn divergence. A static neural generator is trained to directly map a reference distribution to the target data distribution. Theoretically, they establish that the training dynamics with finite samples converge to the continuous distributional dynamics. Empirically, the method achieves a state-of-the-art FID of 1.29 on ImageNet at 256×256 resolution among single-step generators, offering approximately 100× faster sampling than prevailing multi-step diffusion models while demonstrating superior mode coverage and domain transfer capabilities.
📝 Abstract
Diffusion models and flow-based methods have shown impressive generative capability, especially for images, but their sampling is expensive because it requires many iterative updates. We introduce W-Flow, a framework for training a generator that transforms samples from a simple reference distribution into samples from a target data distribution in a single step. This is achieved in two steps: we first define an evolution from the reference distribution to the target distribution through a Wasserstein gradient flow that minimizes an energy functional; second, we train a static neural generator to compress this evolution into one-step generation. We instantiate the energy functional with the Sinkhorn divergence, which yields an efficient optimal-transport-based update rule that captures global distributional discrepancy and improves coverage of the target distribution. We further prove that the finite-sample training dynamics converge to the continuous-time distributional dynamics under suitable assumptions. Empirically, W-Flow sets a new state of the art for one-step ImageNet 256$\times$256 generation, achieving 1.29 FID, with improved mode coverage and domain transfer. Compared to multi-step diffusion models with similar FID scores, our method yields approximately 100$\times$ faster sampling. These results show that Wasserstein gradient flows provide a principled and effective foundation for fast and high-fidelity generative modeling.