🤖 AI Summary
This work addresses the degradation of gradient signal-to-noise ratio (SNR) in importance-weighted evidence lower bound (IW-ELBO) optimization within Euclidean space, which hampers the efficiency of variational inference as the number of samples increases. To overcome this limitation, the authors propose a novel approach that formulates IW-ELBO optimization on the Bures–Wasserstein geometry, leveraging the 2-Wasserstein metric on the manifold of Gaussian distributions. By deriving and projecting Wasserstein gradients, they develop an efficient and scalable Gaussian variational inference algorithm. Theoretical analysis demonstrates that the gradient SNR of the proposed method grows at a rate of Ω(√K) with respect to the number of samples K, substantially outperforming Euclidean-based methods. The framework naturally extends to variational Rényi importance-weighted autoencoder bounds. Empirical results confirm superior approximation accuracy and enhanced optimization stability under large sample regimes compared to existing baselines.
📝 Abstract
The Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI), tightening the standard ELBO and mitigating the mode-seeking behaviour. However, optimizing the IW-ELBO in Euclidean space is often inefficient, as its gradient estimators suffer from a vanishing signal-to-noise ratio (SNR). This paper formulates the optimisation of the IW-ELBO in Bures-Wasserstein space, a manifold of Gaussian distributions equipped with the 2-Wasserstein metric. We derive the Wasserstein gradient of the IW-ELBO and project it onto the Bures-Wasserstein space to yield a tractable algorithm for Gaussian VI. A pivotal contribution of our analysis concerns the stability of the gradient estimator. While the SNR of the standard Euclidean gradient estimator is known to vanish as the number of importance samples $K$ increases, we prove that the SNR of the Wasserstein gradient scales favourably as $\Omega(\sqrt{K})$, ensuring optimisation efficiency even for large $K$. We further extend this geometric analysis to the Variational R\'enyi Importance-Weighted Autoencoder bound, establishing analogous stability guarantees. Experiments demonstrate that the proposed framework achieves superior approximation performance compared to other baselines.