🤖 AI Summary
Variational inference (VI) commonly employs factorized (mean-field) approximations to tractably quantify uncertainty; however, when the true posterior is non-factorizable, such approximations inherently incur fundamental trade-offs in uncertainty estimation. Method: We derive the first impossibility theorem for VI, rigorously characterizing the incompatibility among accurate marginal variance, marginal precision, and generalized variance (i.e., differential entropy) estimation under factorized approximations. Our analysis centers on Gaussian targets with Gaussian mean-field surrogates, and we empirically extend results to non-Gaussian settings. Contribution/Results: We establish a strict ordering among divergence objectives—including KL divergence, α-divergence, and score-matching–type objectives—based on which uncertainty measure they preserve. Crucially, this theoretical ordering remains robust beyond the Gaussian case, providing an interpretable, generalizable principle for divergence selection in VI.
📝 Abstract
Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though~$p$ itself does not factorize. We show that this mismatch leads to an impossibility theorem: if $p$ does not factorize, then any factorized approximation $qin Q$ can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which can be related to the entropy). In practice, the best variational approximation in $Q$ is found by minimizing some divergence $D(q,p)$ between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general $alpha$-divergences, and a score-based divergence which compares $
abla log p$ and $
abla log q$. We provide a thorough theoretical analysis in the setting where $p$ is a Gaussian and $q$ is a (factorized) Gaussian. We show that all the considered divergences can be extit{ordered} based on the estimates of uncertainty they yield as objective functions for~VI. Finally, we empirically evaluate the validity of this ordering when the target distribution $p$ is not Gaussian.