Kernel Renormalization in Bayesian Deep Neural Networks: the Equivalent Wishart Ansatz in the Proportional Regime

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of predicting generalization in deep nonlinear Bayesian neural networks in the joint limit where both training set size and network width scale proportionally. The authors propose an equivalent Wishart hypothesis to characterize the dominant random fluctuations of layerwise empirical kernels in multilayer perceptrons, and combine it with large deviation theory to derive a partition function expressed in terms of a renormalized Neural Network Gaussian Process (NNGP) kernel. For the first time, this non-perturbative approach is extended to deep Bayesian MLPs and CNNs, yielding a kernel renormalization framework governed solely by L self-consistent scalar order parameters. This framework reveals a data-dependent kernel transformation mechanism at finite width. The theory is validated on Bayesian networks with depth around 10 and training sets of size ∼10³, showing excellent agreement with posterior sampling and identifying two distinct types of systematic bias.
📝 Abstract
The scaling limit where both the size of the training set $P$ and the width $N$ of a deep neural network grow at the same rate, the so-called proportional-width regime, has been intensely studied for shallow, single-hidden-layer networks. However, extending these non-perturbative results from shallow architectures to deep non-linear networks has proven very challenging. Here we present an effective approximate approach to predict the generalization performance of Bayesian multi-layer perceptrons (MLPs) of fixed depth $L$ on arbitrary high-dimensional data. We propose an equivalent Wishart Ansatz to capture the dominant stochastic fluctuations of the hierarchical empirical kernels of MLPs. This allows us to perform a large deviation analysis for the partition function of MLPs in the proportional limit, expressed in terms of a renormalized NNGP kernel. In this description, even strong representation learning in the proportional limit is encoded in at most $L$ scalar order parameters, determined self-consistently. Extending the approach to convolutional architectures (CNNs), we identify a hierarchical local kernel renormalization mechanism, which allows to quantify more complex data-dependent transformations of the large-width kernel in CNNs due to finite-width effects. We test our effective theory against sampling experiments from the Bayesian posterior of finite deep neural networks with depths $L \sim O(10)$ and $P\sim O(10^3)$ on classic benchmark datasets, finding overall very good agreement together with two distinct types of systematic deviations.
Problem

Research questions and friction points this paper is trying to address.

proportional-width regime
Bayesian deep neural networks
generalization performance
empirical kernels
finite-width effects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wishart Ansatz
Kernel Renormalization
Proportional Regime
Bayesian Deep Learning
NNGP Kernel
P
Paolo Baglioni
INFN, Sezione di Milano Bicocca, Piazza della Scienza 3, 20126, Milano, Italy; INFN, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy
Christian Keup
Christian Keup
Postdoc at University of Parma, Italy
Theory of artificial and biological neuronal networks
V
Vincenzo Zimbardo
INFN, Sezione di Milano Bicocca, Piazza della Scienza 3, 20126, Milano, Italy; INFN, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy; Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di Parma, Parco Area delle Scienze, 7/A 43124 Parma, Italy
R
Rosalba Pacelli
INFN, sezione di Padova, Via Marzolo 8, 35131 Padova, Italy
Alessandro Vezzani
Alessandro Vezzani
IMEM CNR
Statistical physics
Raffaella Burioni
Raffaella Burioni
University of Parma, Italy
Theoretical PhysicsStatistical physicsGraph and Network Theory
P
Pietro Rotondo
INFN, Sezione di Milano Bicocca, Piazza della Scienza 3, 20126, Milano, Italy; INFN, Gruppo Collegato di Parma, Parco Area delle Scienze 7/A, 43124 Parma, Italy; Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università degli Studi di Parma, Parco Area delle Scienze, 7/A 43124 Parma, Italy