Generalization performance of narrow one-hidden layer networks in the teacher-student setting

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work establishes a generalization theory for narrow single-hidden-layer neural networks under the teacher–student setup, addressing the lack of systematic characterization of general activation functions and finite-width networks in prior studies. Methodologically, it employs statistical physics techniques—combining Langevin dynamics with full-batch gradient descent—to model training dynamics and analytically characterize weight statistics. The analysis yields a closed-form expression for the generalization error applicable to any smooth activation function, theoretically uncovering, for the first time, the typical behavior of empirical risk minimization estimators at finite temperature and the specialization phase transition of hidden neurons. Theoretical predictions exhibit strong agreement with empirical results across both regression and classification tasks. Collectively, this work provides a unified, computationally tractable analytical framework for understanding generalization in shallow neural networks.

Technology Category

Application Category

📝 Abstract
Understanding the generalization abilities of neural networks for simple input-output distributions is crucial to account for their learning performance on real datasets. The classical teacher-student setting, where a network is trained from data obtained thanks to a label-generating teacher model, serves as a perfect theoretical test bed. In this context, a complete theoretical account of the performance of fully connected one-hidden layer networks in the presence of generic activation functions is lacking. In this work, we develop such a general theory for narrow networks, i.e. networks with a large number of hidden units, yet much smaller than the input dimension. Using methods from statistical physics, we provide closed-form expressions for the typical performance of both finite temperature (Bayesian) and empirical risk minimization estimators, in terms of a small number of weight statistics. In doing so, we highlight the presence of a transition where hidden neurons specialize when the number of samples is sufficiently large and proportional to the number of parameters of the network. Our theory accurately predicts the generalization error of neural networks trained on regression or classification tasks with either noisy full-batch gradient descent (Langevin dynamics) or full-batch gradient descent.
Problem

Research questions and friction points this paper is trying to address.

Understanding generalization in narrow one-hidden layer networks
Developing theory for networks with generic activation functions
Predicting generalization error in regression and classification tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

General theory for narrow one-hidden layer networks
Closed-form expressions using statistical physics methods
Transition analysis of hidden neuron specialization
🔎 Similar Papers
No similar papers found.
Jean Barbier
Jean Barbier
Associate Professor, International Center for Theoretical Physics
high-dimensional statisticsmachine learninginformation theoryspin glassesrandom matrices
F
Federica Gerace
Department of Mathematics, University of Bologna, Piazza di Porta San Donato 5, 40126, Bologna (BO), Italy
A
Alessandro Ingrosso
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
Clarissa Lauditi
Clarissa Lauditi
Postdoctoral Fellow, Harvard University
Statistical PhysicsDisordered SystemsMachine Learning
Enrico M. Malatesta
Enrico M. Malatesta
Assistant Professor, Bocconi University
Statistical PhysicsMachine LearningDisordered SystemsStatistical Field Theory
G
Gibbs Nwemadji
International School of Advanced Studies (SISSA), Trieste, Italy
R
Rodrigo Pérez Ortiz
Alma Mater Studiorum – Università di Bologna (Unibo), IT-40126 Bologna, Italy