🤖 AI Summary
This work addresses the challenge of specifying prior distributions in Bayesian deep learning—particularly the difficulty of performing inference in the absence of initial beliefs—by proposing a learnable weight prior mechanism that, for the first time, integrates Bayesian neural networks with probabilistic meta-learning. Framed within the neural process paradigm, the approach treats network weights as latent variables and learns a shared prior across multiple tasks. Amortized variational inference is employed to efficiently approximate task-specific posteriors. The resulting model supports minibatch training within tasks and excels in meta-learning under extreme data scarcity. Beyond yielding more interpretable Bayesian neural network behavior through well-calibrated priors, the framework also functions as a flexible generative model, successfully accomplishing neural process tasks previously deemed infeasible.
📝 Abstract
One of the core facets of Bayesianism is in the updating of prior beliefs in light of new evidence$\text{ -- }$so how can we maintain a Bayesian approach if we have no prior beliefs in the first place? This is one of the central challenges in the field of Bayesian deep learning, where it is not clear how to represent beliefs about a prediction task by prior distributions over model parameters. Bridging the fields of Bayesian deep learning and probabilistic meta-learning, we introduce a way to $\textit{learn}$ a weights prior from a collection of datasets by introducing a way to perform per-dataset amortised variational inference. The model we develop can be viewed as a neural process whose latent variable is the set of weights of a BNN and whose decoder is the neural network parameterised by a sample of the latent variable itself. This unique model allows us to study the behaviour of Bayesian neural networks under well-specified priors, use Bayesian neural networks as flexible generative models, and perform desirable but previously elusive feats in neural processes such as within-task minibatching or meta-learning under extreme data-starvation.