🤖 AI Summary
Existing neural optimal transport methods rely on adversarial min-max optimization and multi-network architectures, leading to training instability and high computational costs. This work proposes a single-network framework that parameterizes only one potential function in the Kantorovich dual formulation, recasting the c-transform as a proximal fixed-point problem. This approach enables efficient gradient computation without adversarial training or implicit differentiation while rigorously preserving dual feasibility. The method naturally accommodates class-conditional settings and bidirectional transport maps, achieving substantially improved transport accuracy and training stability across high-dimensional Gaussian benchmarks, physical data, and image translation tasks, along with significantly reduced computational and memory overhead.
📝 Abstract
We propose an implicit neural formulation of optimal transport that eliminates adversarial min--max optimization and multi-network architectures commonly used in existing approaches. Our key idea is to parameterize a single potential in the Kantorovich dual and reformulate the associated c-transform as a proximal fixed-point problem. This yields a stable single-network framework in which dual feasibility is enforced exactly through proximal optimality conditions rather than adversarial training. Despite the inner fixed-point computation, gradients can be computed without differentiating through the fixed-point iterations, enabling efficient training without requiring implicit differentiation. We further establish convergence of stochastic gradient descent. The resulting framework is efficient, scalable, and broadly applicable: it simultaneously recovers forward and backward transport maps and naturally extends to class-conditional settings. Experiments on high-dimensional Gaussian benchmarks, physical datasets, and image translation tasks demonstrate strong transport accuracy together with improved training stability and favorable computational and memory efficiency.