Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

📅 2023-07-27
🏛️ International Conference on Artificial Intelligence and Statistics
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the convergence and variance control of the Sticking-the-Landing (STL) estimator in black-box variational inference (BBVI) under model misspecification. First, it establishes a **quadratic upper bound** on the STL gradient variance—its first such non-asymptotic guarantee—and proves that BBVI achieves a **non-asymptotic linear convergence rate**, the first rigorous theoretical result of its kind. To ensure stability of stochastic gradient descent (SGD), the paper introduces a novel parameterization domain: a family of **triangular scaling matrices admitting linear-time projection**, reducing projection complexity to Θ(d). Furthermore, it provides an **explicit non-asymptotic complexity comparison** between STL and standard entropy-gradient estimators, quantifying STL’s variance-reduction advantage. The analysis unifies both well-specified and misspecified variational families, substantially advancing the theoretical foundations and practical reliability of BBVI.
📝 Abstract
We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called"linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $Theta(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.
Problem

Research questions and friction points this paper is trying to address.

Analyzing convergence rates of black-box variational inference with control variates
Proving geometric convergence under perfect variational family specification
Comparing STL estimator performance with regular entropy gradient estimators
Innovation

Methods, ideas, or system contributions that make the work stand out.

STL estimator achieves geometric convergence rate
Projected SGD with triangular scale matrix projection
Quadratic bound on gradient variance for misspecification
🔎 Similar Papers
No similar papers found.
Kyurae Kim
Kyurae Kim
PhD Student, University of Pennsylvania
Bayesian inferencestochastic optimizationmachine learningsignal processing
Y
Yian Ma
University of California San Diego
J
Jacob R. Gardner
University of Pennsylvania