Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Gaussian process limits, such as the Neural Network Gaussian Process (NNGP), fail to capture rare yet dominant non-Gaussian fluctuations and their associated feature learning mechanisms in the posterior of wide Bayesian neural networks. This work introduces large deviation theory to this domain for the first time, formulating a rate function at the level of predictive functions as a variational objective. It proposes a novel framework that jointly optimizes the predictor and a data-dependent kernel, thereby moving beyond the conventional assumption of a fixed kernel. This approach enables a functional-level characterization of posterior non-Gaussianity, accurately accounts for finite-width effects in moderately wide networks, and successfully captures key phenomena such as posterior deformation, non-Gaussian tails, and adaptive kernel selection.

Technology Category

Application Category

📝 Abstract
We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.
Problem

Research questions and friction points this paper is trying to address.

Bayesian Neural Networks
Large Deviations
Feature Learning
NNGP
Posterior Concentration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large deviations
Bayesian neural networks
Feature learning
Rate function
NNGP
🔎 Similar Papers
No similar papers found.