🤖 AI Summary
To address the $mathcal{O}(N^2)$ per-iteration complexity of full-rank covariance approximations in black-box variational inference (BBVI) for hierarchical Bayesian models—hindering scalability to large-scale data—this paper introduces structured variational families, including low-rank-plus-diagonal and block-diagonal scale matrices. These structures preserve expressive modeling of local latent variables while reducing computational complexity to $mathcal{O}(N)$. We provide the first rigorous theoretical proof that specific structured approximations achieve this linear complexity, bridging the theoretical gap between mean-field and full-rank variational families. Leveraging stochastic gradient optimization and convergence analysis, we empirically validate the approach on large-scale hierarchical models, demonstrating significant speedups—multiple times faster than full-rank BBVI—while maintaining high accuracy, comparable to or exceeding that of mean-field inference.
📝 Abstract
Variational families with full-rank covariance approximations are known not to work well in black-box variational inference (BBVI), both empirically and theoretically. In fact, recent computational complexity results for BBVI have established that full-rank variational families scale poorly with the dimensionality of the problem compared to e.g. mean-field families. This is particularly critical to hierarchical Bayesian models with local variables; their dimensionality increases with the size of the datasets. Consequently, one gets an iteration complexity with an explicit $mathcal{O}(N^2)$ dependence on the dataset size $N$. In this paper, we explore a theoretical middle ground between mean-field variational families and full-rank families: structured variational families. We rigorously prove that certain scale matrix structures can achieve a better iteration complexity of $mathcal{O}left(N
ight)$, implying better scaling with respect to $N$. We empirically verify our theoretical results on large-scale hierarchical models.