🤖 AI Summary
Traditional optimal transport (OT)-based flow matching suffers from prohibitive computational overhead during large-batch training—e.g., Sinkhorn’s O(n²/ε²) complexity. To address this, we propose Semi-Discrete Flow Matching (SD-FM). Our method replaces explicit batch-wise OT computation with a learnable dual potential vector and leverages maximum inner-product search (MIPS) for efficient sample matching, while incorporating a time-dependent velocity field. This design eliminates the quadratic dependence on batch size and regularization strength, substantially reducing training complexity. Experiments across multiple benchmarks demonstrate that SD-FM outperforms both standard flow matching and OT-based FM, achieving comparable or superior generation quality while significantly accelerating training—particularly under high inference budgets.
📝 Abstract
Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE. These models are often trained using flow matching, i.e. by sampling random pairs of noise and target points $(mathbf{x}_0,mathbf{x}_1)$ and ensuring that the velocity field is aligned, on average, with $mathbf{x}_1-mathbf{x}_0$ when evaluated along a segment linking $mathbf{x}_0$ to $mathbf{x}_1$. While these pairs are sampled independently by default, they can also be selected more carefully by matching batches of $n$ noise to $n$ target points using an optimal transport (OT) solver. Although promising in theory, the OT flow matching (OT-FM) approach is not widely used in practice. Zhang et al. (2025) pointed out recently that OT-FM truly starts paying off when the batch size $n$ grows significantly, which only a multi-GPU implementation of the Sinkhorn algorithm can handle. Unfortunately, the costs of running Sinkhorn can quickly balloon, requiring $O(n^2/varepsilon^2)$ operations for every $n$ pairs used to fit the velocity field, where $varepsilon$ is a regularization parameter that should be typically small to yield better results. To fulfill the theoretical promises of OT-FM, we propose to move away from batch-OT and rely instead on a semidiscrete formulation that leverages the fact that the target dataset distribution is usually of finite size $N$. The SD-OT problem is solved by estimating a dual potential vector using SGD; using that vector, freshly sampled noise vectors at train time can then be matched with data points at the cost of a maximum inner product search (MIPS). Semidiscrete FM (SD-FM) removes the quadratic dependency on $n/varepsilon$ that bottlenecks OT-FM. SD-FM beats both FM and OT-FM on all training metrics and inference budget constraints, across multiple datasets, on unconditional/conditional generation, or when using mean-flow models.