🤖 AI Summary
This work addresses the challenge of efficiently sampling worst-case distributions in distributionally robust optimization (DRO). Methodologically, it establishes a mathematically rigorous gradient flow framework governed by partial differential equations, integrating Wasserstein–Fisher–Rao and Stein variational gradient flow theories into DRO. Coupled with Markov chain Monte Carlo (MCMC) sampling, the approach formulates a continuous-time evolution model over the space of probability distributions and designs corresponding discretization algorithms for solving both Wasserstein and Sinkhorn DRO problems. Key contributions include: (i) the first unified interpretation of several classical DRO algorithms as instances of gradient flows, elucidating their convergence dynamics and theoretical limits; (ii) a novel, interpretable, and analyzable paradigm for distributional optimization; and (iii) empirical validation—via numerical experiments—of effective worst-case distribution generation, algorithmic reproducibility, and performance improvement, thereby enhancing both theoretical depth and computational tractability.
📝 Abstract
We propose a mathematically principled PDE gradient flow framework for distributionally robust optimization (DRO). Exploiting the recent advances in the intersection of Markov Chain Monte Carlo sampling and gradient flow theory, we show that our theoretical framework can be implemented as practical algorithms for sampling from worst-case distributions and, consequently, DRO. While numerous previous works have proposed various reformulation techniques and iterative algorithms, we contribute a sound gradient flow view of the distributional optimization that can be used to construct new algorithms. As an example of applications, we solve a class of Wasserstein and Sinkhorn DRO problems using the recently-discovered Wasserstein Fisher-Rao and Stein variational gradient flows. Notably, we also show some simple reductions of our framework recover exactly previously proposed popular DRO methods, and provide new insights into their theoretical limit and optimization dynamics. Numerical studies based on stochastic gradient descent provide empirical backing for our theoretical findings.