🤖 AI Summary
To address the intractability of mixture prior densities in semi-implicit variational inference (SIVI)—which prevents direct optimization of the evidence lower bound (ELBO)—this paper proposes a novel particle-based variational inference framework. The core method models the mixture prior as a nonparametric empirical measure and directly optimizes the ELBO via the Euclidean–Wasserstein gradient flow, entirely bypassing inner-loop MCMC sampling or minimax approximations. Theoretically, we rigorously establish the existence, uniqueness, and propagation-of-chaos property of the gradient flow solution. Algorithmically, we integrate empirical measure theory with free-energy functional optimization to enable efficient, differentiable, end-to-end training. Experiments demonstrate that our approach consistently outperforms existing SIVI methods across multiple benchmarks, achieving both theoretical soundness and practical superiority.
📝 Abstract
Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible, so they resort to one of the following: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a free energy functional. PVI arises naturally as a particle approximation of a Euclidean--Wasserstein gradient flow and, unlike prior works, it directly optimizes the ELBO whilst making no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably compared to other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results.