Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

Existing Gaussian process limits, such as the Neural Network Gaussian Process (NNGP), fail to capture rare yet dominant non-Gaussian fluctuations and their associated feature learning mechanisms in the posterior of wide Bayesian neural networks. This work introduces large deviation theory to this domain for the first time, formulating a rate function at the level of predictive functions as a variational objective. It proposes a novel framework that jointly optimizes the predictor and a data-dependent kernel, thereby moving beyond the conventional assumption of a fixed kernel. This approach enables a functional-level characterization of posterior non-Gaussianity, accurately accounts for finite-width effects in moderately wide networks, and successfully captures key phenomena such as posterior deformation, non-Gaussian tails, and adaptive kernel selection.

Technology Category

Application Category

📝 Abstract

We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.

Problem

Research questions and friction points this paper is trying to address.

Bayesian Neural Networks

Large Deviations

Feature Learning

NNGP

Posterior Concentration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large deviations

Bayesian neural networks

Feature learning