🤖 AI Summary
To address the trade-off between bias and convergence rate induced by entropy regularization in semi-discrete optimal transport (OT), this paper proposes the Dynamic Regularization Annealing Gradient (DRAG) algorithm. DRAG operates within a stochastic gradient descent framework, adaptively decaying the entropy regularization strength while incorporating gradient averaging to progressively eliminate regularization bias without sacrificing computational efficiency. Theoretically, DRAG guarantees $O(1/t)$ convergence rates for both the OT cost and dual potential estimates, and $O(1/sqrt{t})$ convergence for the OT map; moreover, as the regularization coefficient vanishes, the solution converges to the unbiased primal OT solution. Empirical evaluations demonstrate DRAG’s superior accuracy, faster convergence, and enhanced robustness compared to existing methods.
📝 Abstract
Adding entropic regularization to Optimal Transport (OT) problems has become a standard approach for designing efficient and scalable solvers. However, regularization introduces a bias from the true solution. To mitigate this bias while still benefiting from the acceleration provided by regularization, a natural solver would adaptively decrease the regularization as it approaches the solution. Although some algorithms heuristically implement this idea, their theoretical guarantees and the extent of their acceleration compared to using a fixed regularization remain largely open. In the setting of semi-discrete OT, where the source measure is continuous and the target is discrete, we prove that decreasing the regularization can indeed accelerate convergence. To this end, we introduce DRAG: Decreasing (entropic) Regularization Averaged Gradient, a stochastic gradient descent algorithm where the regularization decreases with the number of optimization steps. We provide a theoretical analysis showing that DRAG benefits from decreasing regularization compared to a fixed scheme, achieving an unbiased $mathcal{O}(1/t)$ sample and iteration complexity for both the OT cost and the potential estimation, and a $mathcal{O}(1/sqrt{t})$ rate for the OT map. Our theoretical findings are supported by numerical experiments that validate the effectiveness of DRAG and highlight its practical advantages.