🤖 AI Summary
In large-scale learning, minibatch optimal transport (OT) is commonly employed to approximate exact OT, yet the global coupling mechanism underlying this approximation remains unclear. This work formally defines the expected minibatch OT plan—as the expectation of independent empirical minibatch OT plans—and establishes its consistency and convergence rates, with a refined characterization of bias and convergence behavior in the semi-discrete setting. The proposed framework is applied to flow matching, yielding velocity fields with favorable regularity that guarantee a unique flow transporting a continuous source distribution to a discrete target distribution. Experiments on diatomic models, synthetic data, and image tasks validate the theoretical findings, revealing a trade-off between batch size and numerical integration error, and demonstrating significantly enhanced stability and efficiency in flow matching.
📝 Abstract
Solving optimal transport (OT) on random minibatches is a common surrogate for exact OT in large-scale learning. In flow matching (FM), this surrogate is used to obtain OT-like couplings that can straighten probability paths and reduce numerical integration cost. Yet, the population-level coupling induced by repeated minibatch OT remains only partially understood. We formalize this coupling as the expected batch OT plan $\overlineπ_{k}$, obtained by averaging empirical OT plans over independent minibatches of size $k$. We then establish its large-batch consistency and, in the semidiscrete case relevant to generative modeling, derive rates for both the transport-cost bias and the convergence of $\overlineπ_{k}$ to the OT plan. For FM, this yields a population coupling whose induced velocity field is regular enough to define a unique flow from the source to the discrete target. We finally quantify how OT batch size interacts with numerical integration in a tractable two-atom model and in synthetic and image experiments.