🤖 AI Summary
Although stochastic natural gradient variational inference (NGVI) is widely used, its non-asymptotic convergence under various stepsize and minibatch scheduling schemes lacks theoretical guarantees. This work addresses this gap by proposing a projected stochastic NGVI algorithm for exponential-family variational distributions and establishes, for the first time, systematic non-asymptotic convergence rates across multiple combinations of stepsizes (fixed or decreasing) and minibatch sizes (fixed or increasing). Specifically, with fixed hyperparameters, the algorithm converges geometrically to a neighborhood of the optimal solution; under appropriate scheduling, it achieves a global convergence rate of 𝒪(1/T^ρ) with ρ ≥ 1. These results significantly extend the theoretical foundations of NGVI and provide principled guidance for balancing computational resources and inference accuracy in practical applications.
📝 Abstract
Stochastic natural gradient variational inference (NGVI) is a popular and efficient algorithm for Bayesian inference. Despite empirical success, the convergence of this method is still not fully understood. In this work, we define and study a projected stochastic NGVI when variational distributions form an exponential family. Stochasticity arises when either gradients are intractable expectations or large sums. We prove new non-asymptotic convergence results for combinations of constant or decreasing step sizes and constant or increasing sample/batch sizes. When all hyperparameters are fixed, NGVI is shown to converge geometrically to a neighborhood of the optimum, while we establish convergence to the optimum with rates of the form $\mathcal{O}\left(\frac{1}{T^ρ} \right)$, possibly with $ρ\geq 1$, for all other combinations of step size and sample/batch size schedules. These rates apply when the target posterior distribution is close in some sense to the considered exponential family. Our theoretical results extend existing NGVI and stochastic optimization results and provide more flexibility to adjust, in a principled way, step sizes and sample/batch sizes in order to meet speed, resources, or accuracy constraints.