🤖 AI Summary
This work establishes the optimization foundations of the discrete sliced Wasserstein (SW) loss, specifically analyzing the regularity and nonsmooth structure of the SW₂² energy with respect to support point locations for two uniform discrete measures with equal cardinality.
Method: We develop a novel Clarke subdifferential analysis framework for the SW loss and rigorously analyze its Monte Carlo approximation.
Contribution/Results: We prove, for the first time, that critical points of the Monte Carlo estimator converge almost surely to those of the true SW₂² energy; we further establish uniform convergence and a central limit theorem for the estimator. Moreover, we demonstrate that stochastic gradient descent (SGD) converges to generalized critical points despite the nonsmoothness of the loss. These results provide rigorous theoretical guarantees for SW-based generative modeling and fill a fundamental gap in the optimization-theoretic understanding of discrete SW losses.
📝 Abstract
The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $mathcal{E}: Y longmapsto mathrm{SW}_2^2(gamma_Y, gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y in mathbb{R}^{n imes d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $mathcal{E}_p$ to those of $mathcal{E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $mathcal{E}$ and $mathcal{E}_p$ converge towards (Clarke) critical points of these energies.