Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

📅 2024-07-26
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF

career value

246K/year
🤖 AI Summary
Existing finite-width deep neural networks lack analytically tractable Gaussian process (GP) approximations with provable error bounds. Method: We propose the first Gaussian Process Mixture (GPM) approximation framework with certified error bounds: leveraging Wasserstein distance to model output distributions layer-wise, it achieves ε-accurate approximation of arbitrary non-i.i.d. parameterized networks over finite input sets. The method integrates optimal transport theory with hierarchical probabilistic modeling, yielding differentiable error bounds that guide network parameter optimization toward user-specified prior distributions. Results: Experiments demonstrate that GPM enables controllable-accuracy approximation on both regression and classification tasks, while simultaneously supporting principled uncertainty quantification and Bayesian prior design—bridging finite-width neural networks and rigorous GP inference with guaranteed approximation quality.

Technology Category

Application Category

📝 Abstract
Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $epsilon>0$ our approach is able to return a mixture of Gaussian processes that is $epsilon$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.
Problem

Research questions and friction points this paper is trying to address.

Approximating finite neural networks with Gaussian processes mixtures
Providing error bounds for neural network approximation accuracy
Enabling prior selection for Bayesian inference via differentiability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Finite neural networks approximated by Gaussian mixtures
Wasserstein distance quantifies probabilistic model closeness
Differentiable error bounds enable prior selection tuning