Statistical mechanics of extensive-width Bayesian neural networks near interpolation

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work investigates the learning dynamics of two-layer fully connected Bayesian neural networks under the teacher–student framework, aiming to bridge the gap between theoretical analysis and empirical behavior—particularly in the finite-width regime where hidden-layer width scales proportionally with input dimension and the model operates in the interpolation regime (number of parameters ≈ number of samples). Leveraging statistical mechanical mean-field theory, random matrix analysis, and exact Bayesian inference, we characterize, for the first time, a “specialization” phase transition wherein nonlinear weight combinations progressively align with the teacher’s weights. We identify multiple learning phase transitions, quantify the relationship between feature learnability and minimal sample complexity, and reveal how data quantity and feature strength jointly govern the learning critical point. While specialization is theoretically guaranteed, we demonstrate that it is often hindered in practice by a statistical–computational gap, rendering it inaccessible to standard inference algorithms.

Technology Category

Application Category

📝 Abstract

For three decades statistical mechanics has been providing a framework to analyse neural networks. However, the theoretically tractable models, e.g., perceptrons, random features models and kernel machines, or multi-index models and committee machines with few neurons, remained simple compared to those used in applications. In this paper we help reducing the gap between practical networks and their theoretical understanding through a statistical physics analysis of the supervised learning of a two-layer fully connected network with generic weight distribution and activation function, whose hidden layer is large but remains proportional to the inputs dimension. This makes it more realistic than infinitely wide networks where no feature learning occurs, but also more expressive than narrow ones or with fixed inner weights. We focus on the Bayes-optimal learning in the teacher-student scenario, i.e., with a dataset generated by another network with the same architecture. We operate around interpolation, where the number of trainable parameters and of data are comparable and feature learning emerges. Our analysis uncovers a rich phenomenology with various learning transitions as the number of data increases. In particular, the more strongly the features (i.e., hidden neurons of the target) contribute to the observed responses, the less data is needed to learn them. Moreover, when the data is scarce, the model only learns non-linear combinations of the teacher weights, rather than"specialising"by aligning its weights with the teacher's. Specialisation occurs only when enough data becomes available, but it can be hard to find for practical training algorithms, possibly due to statistical-to-computational~gaps.

Problem

Research questions and friction points this paper is trying to address.

Analyzing Bayesian neural networks with extensive width near interpolation

Understanding feature learning in finite-width two-layer networks

Identifying learning transitions and specialization in teacher-student scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-layer fully connected Bayesian neural network

Finite-width proportional to input dimension

Bayes-optimal teacher-student learning scenario

🔎 Similar Papers

No similar papers found.