Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

📅 2024-07-26

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

258K/year

🤖 AI Summary

Existing finite-width deep neural networks lack analytically tractable Gaussian process (GP) approximations with provable error bounds. Method: We propose the first Gaussian Process Mixture (GPM) approximation framework with certified error bounds: leveraging Wasserstein distance to model output distributions layer-wise, it achieves ε-accurate approximation of arbitrary non-i.i.d. parameterized networks over finite input sets. The method integrates optimal transport theory with hierarchical probabilistic modeling, yielding differentiable error bounds that guide network parameter optimization toward user-specified prior distributions. Results: Experiments demonstrate that GPM enables controllable-accuracy approximation on both regression and classification tasks, while simultaneously supporting principled uncertainty quantification and Bayesian prior design—bridging finite-width neural networks and rigorous GP inference with guaranteed approximation quality.

Technology Category

Application Category

📝 Abstract

Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $epsilon>0$ our approach is able to return a mixture of Gaussian processes that is $epsilon$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.

Problem

Research questions and friction points this paper is trying to address.

Approximating finite neural networks with Gaussian processes mixtures

Providing error bounds for neural network approximation accuracy

Enabling prior selection for Bayesian inference via differentiability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Finite neural networks approximated by Gaussian mixtures

Wasserstein distance quantifies probabilistic model closeness

Differentiable error bounds enable prior selection tuning

🔎 Similar Papers

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions