š¤ AI Summary
This work addresses the challenges of slow convergence and numerical instability commonly encountered in large-scale mean-field variational inference under non-conjugate and high-dimensional settings. The authors reformulate the problem as a constrained finite-sum optimization task and propose PD-VI, a primal-dual algorithm based on the augmented Lagrangian framework. To further enhance efficiency, they introduce P²D-VI, which incorporates block preconditioning to account for the distinct geometric structures of different parameter blocks, enabling effective joint updates. This study is the first to integrate primal-dual optimization with block preconditioning for variational inference, eliminating the need for conjugacy assumptions or explicit bounded-variance conditions. The method enjoys a general O(1/T) convergence rate and achieves linear convergence under strong convexity. Experiments demonstrate significant improvements over existing stochastic variational inference algorithms on both synthetic and large-scale spatial transcriptomics datasets.
š Abstract
In this work, we investigate the large-scale mean-field variational inference (MFVI) problem from a mini-batch primal-dual perspective. By reformulating MFVI as a constrained finite-sum problem, we develop a novel primal-dual algorithm based on an augmented Lagrangian formulation, termed primal-dual variational inference (PD-VI). PD-VI jointly updates global and local variational parameters in the evidence lower bound in a scalable manner. To further account for heterogeneous loss geometry across different variational parameter blocks, we introduce a block-preconditioned extension, P$^2$D-VI, which adapts the primal-dual updates to the geometry of each parameter block and improves both numerical robustness and practical efficiency. We establish convergence guarantees for both PD-VI and P$^2$D-VI under properly chosen constant step size, without relying on conjugacy assumptions or explicit bounded-variance conditions. In particular, we prove $O(1/T)$ convergence to a stationary point in general settings and linear convergence under strong convexity. Numerical experiments on synthetic data and a real large-scale spatial transcriptomics dataset demonstrate that our methods consistently outperform existing stochastic variational inference approaches in terms of convergence speed and solution quality.