🤖 AI Summary
This work addresses the bias in evidence lower bound (ELBO) optimization arising from non-integrable densities in semi-implicit variational inference, as well as the high computational cost of existing score-matching approaches that require nested optimization. The authors propose Kernelized Semi-Implicit Variational Inference (KSIVI), which introduces kernel methods into this framework for the first time. By leveraging explicit solutions in a reproducing kernel Hilbert space, KSIVI eliminates the need for inner-loop optimization and reformulates the objective as a kernel Stein discrepancy (KSD), enabling efficient stochastic gradient optimization. The method avoids complex min-max formulations, supports multi-layer hierarchical extensions to enhance expressiveness, and provides theoretical guarantees including a variance bound on gradient estimates and a statistical generalization error bound of order Õ(1/√n). Experiments on both synthetic and real-world Bayesian inference tasks demonstrate its effectiveness and scalability.
📝 Abstract
Semi-implicit variational inference (SIVI) enhances the expressiveness of variational families through hierarchical semi-implicit distributions, but the intractability of their densities makes standard ELBO-based optimization biased. Recent score-matching approaches to SIVI (SIVI-SM) address this issue via a minimax formulation, at the expense of an additional lower-level optimization problem. In this paper, we propose kernel semi-implicit variational inference (KSIVI), a principled and tractable alternative that eliminates the lower-level optimization by leveraging kernel methods. We show that when optimizing over a reproducing kernel Hilbert space, the lower-level problem admits an explicit solution, reducing the objective to the kernel Stein discrepancy (KSD). Exploiting the hierarchical structure of semi-implicit distributions, the resulting KSD objective can be efficiently optimized using stochastic gradient methods. We establish optimization guarantees via variance bounds on Monte Carlo gradient estimators and derive statistical generalization bounds of order $\tilde{\mathcal{O}}(1/\sqrt{n})$. We further introduce a multi-layer hierarchical extension that improves expressiveness while preserving tractability. Empirical results on synthetic and real-world Bayesian inference tasks demonstrate the effectiveness of KSIVI.