🤖 AI Summary
Function-space variational inference (VI) in Bayesian neural networks (BNNs) avoids the challenge of specifying weight priors but suffers from an ill-posed evidence lower bound (ELBO), which Burt et al. (2020) showed often evaluates to negative infinity. Method: We propose the first well-defined function-space VI objective based on a regularized KL divergence, rigorously supporting Gaussian process (GP) priors and resolving the ELBO’s undefinedness. Our approach unifies generalized VI with function-space modeling, ensuring both theoretical soundness and computational tractability. Contribution/Results: Experiments on synthetic and small-scale real-world datasets demonstrate that our method faithfully recovers GP prior behavior. In regression, classification, and out-of-distribution detection, it yields significantly more calibrated uncertainty estimates than both weight-space VI and existing function-space VI baselines.
📝 Abstract
Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.