🤖 AI Summary
This work addresses the computational inefficiency of optimizing the KL divergence in mean-field variational inference (MFVI). To circumvent this challenge, we propose constructing a finite-dimensional polyhedral subset 𝒫⋄ within the Wasserstein space and minimizing the Wasserstein distance over 𝒫⋄ to approximate the target posterior π. Our method establishes, for the first time, a theoretical framework for polyhedral geometry in Wasserstein space and devises a first-order gradient algorithm with acceleration. Theoretical contributions include: (i) the first end-to-end convergence analysis for Wasserstein-based MFVI; (ii) provable error bounds under strong log-concavity; and (iii) an optimal O(1/k²) convergence rate guarantee. Experiments demonstrate that the approach accurately constructs product measures approximating the optimal posterior π*, and provides the first rigorous convergence guarantee for MFVI via gradient-based optimization.
📝 Abstract
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbb{R}^d$ by a product measure $pi^star$. When $pi$ is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that $pi^star$ is close to the minimizer $pi^star_diamond$ of the KL divergence over a emph{polyhedral} set $mathcal{P}_diamond$, and (2) an algorithm for minimizing $ ext{KL}(cdot|pi)$ over $mathcal{P}_diamond$ based on accelerated gradient descent over $R^d$. As a byproduct of our analysis, we obtain the first end-to-end analysis for gradient-based algorithms for MFVI.