🤖 AI Summary
This work addresses the tendency of semi-dual neural optimal transport to converge to degenerate solutions when data concentrate on low-dimensional manifolds, a pathology arising from under-constrained objectives outside the manifold that yield unreliable mappings. To mitigate this ambiguity, the authors introduce additive noise smoothing and, leveraging stability properties of optimal transport plans together with bias analysis and finite-sample error theory, derive a computable optimal stopping noise level $\varepsilon_{\text{stat}}(N)$, which scales with the intrinsic data dimension $m$. The proposed approach achieves statistical optimality in convergence rates independent of the ambient dimension and establishes a principled stopping criterion: noise levels below $\varepsilon_{\text{stat}}(N)$, while degrading optimization conditioning, do not improve estimation accuracy.
📝 Abstract
Semi-dual neural optimal transport learns a transport map via a max-min objective, yet training can converge to incorrect or degenerate maps. We fully characterize these spurious solutions in the common regime where data concentrate on low-dimensional manifold: the objective is underconstrained off the data manifold, while the on-manifold transport signal remains identifiable. Following Choi, Choi, and Kwon (2025), we study additive-noise smoothing as a remedy and prove new map recovery guarantees as the noise vanishes. Our main practical contribution is a computable terminal noise level $\varepsilon_{\mathrm{stat}}(N)$ that attains the optimal statistical rate, with scaling governed by the intrinsic dimension $m$ of the data. The formula arises from a theoretical unified analysis of (i) quantitative stability of optimal plans, (ii) smoothing-induced bias, and (iii) finite-sample error, yielding rates that depend on $m$ rather than the ambient dimension. Finally, we show that the reduced semi-dual objective becomes increasingly ill-conditioned as $\varepsilon \downarrow 0$. This provides a principled stopping rule: annealing below $\varepsilon_{\mathrm{stat}}(N)$ can $\textit{worsen}$ optimization conditioning without improving statistical accuracy.