🤖 AI Summary
Variational inference (VI) suffers from theoretical challenges in analyzing convergence of ELBO optimization due to its inherent nonconvexity and nonsmoothness.
Method: Leveraging the structural properties of the log-partition function for exponential-family distributions, we reformulate the negative ELBO as a Bregman divergence, thereby establishing a unified information-geometric framework for VI analysis.
Contribution/Results: We first reveal a weak monotonicity property of the ELBO optimization landscape and, by exploiting spectral characteristics of the Fisher information matrix, derive the first non-asymptotic convergence rate bounds for gradient descent under both fixed and diminishing step sizes. Crucially, our analysis dispenses with standard strong convexity or Lipschitz gradient assumptions—significantly broadening the applicability of VI convergence theory. This work provides the first geometrically grounded convergence theory for Bayesian variational learning with explicit, non-asymptotic rate guarantees.
📝 Abstract
Variational Inference (VI) provides a scalable framework for Bayesian inference by optimizing the Evidence Lower Bound (ELBO), but convergence analysis remains challenging due to the objective's non-convexity and non-smoothness in Euclidean space. We establish a novel theoretical framework for analyzing VI convergence by exploiting the exponential family structure of distributions. We express negative ELBO as a Bregman divergence with respect to the log-partition function, enabling a geometric analysis of the optimization landscape. We show that this Bregman representation admits a weak monotonicity property that, while weaker than convexity, provides sufficient structure for rigorous convergence analysis. By deriving bounds on the objective function along rays in parameter space, we establish properties governed by the spectral characteristics of the Fisher information matrix. Under this geometric framework, we prove non-asymptotic convergence rates for gradient descent algorithms with both constant and diminishing step sizes.