🤖 AI Summary
This paper investigates the robustness of variational inference (VI) under model misspecification: specifically, whether KL-minimizing VI accurately recovers the mean and correlation matrix of a true posterior density $p$ when $p$ is even-symmetric or ellipsoidally symmetric, and the variational family $Q$ is the corresponding location-scale family—potentially excluding $p$. The authors establish, for the first time, rigorous guarantees: under even symmetry, KL-minimizing VI exactly recovers the mean of $p$; under ellipsoidal symmetry, it exactly recovers the correlation matrix. These results hold robustly across common misspecifications, including factorized approximations and heavy-/light-tailed mismatches. The analysis leverages functional extremal characterizations under symmetry constraints. Empirical evaluation confirms smooth degradation of estimation error as symmetry weakens. Collectively, this work provides novel theoretical foundations and design principles for Bayesian approximate inference.
📝 Abstract
Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $Q$. This is typically done by minimizing the exclusive Kullback-Leibler divergence, $ ext{KL}(q||p)$. In practice, $Q$ is not rich enough to contain $p$, and the approximation is misspecified even when it is a unique global minimizer of $ ext{KL}(q||p)$. In this paper, we analyze the robustness of VI to these misspecifications when $p$ exhibits certain symmetries and $Q$ is a location-scale family that shares these symmetries. We prove strong guarantees for VI not only under mild regularity conditions but also in the face of severe misspecifications. Namely, we show that (i) VI recovers the mean of $p$ when $p$ exhibits an extit{even} symmetry, and (ii) it recovers the correlation matrix of $p$ when in addition~$p$ exhibits an extit{elliptical} symmetry. These guarantees hold for the mean even when $q$ is factorized and $p$ is not, and for the correlation matrix even when~$q$ and~$p$ behave differently in their tails. We analyze various regimes of Bayesian inference where these symmetries are useful idealizations, and we also investigate experimentally how VI behaves in their absence.