🤖 AI Summary
This work investigates the training dynamics and generalization behavior of overparameterized shallow neural networks with quadratic activation under a teacher-student framework. In the extensive regime where input dimension scales proportionally with network width and sample size grows quadratically, the authors combine dynamical mean-field theory with high-dimensional probability analysis to characterize the high-dimensional limit of gradient flow dynamics. They further introduce L² regularization to analyze long-time performance and spectral properties. Key contributions include uncovering the double descent phenomenon in the presence of label noise, establishing a precise threshold for perfect recovery under small regularization, and quantitatively elucidating how overparameterization continues to enhance generalization beyond the interpolation threshold, thereby revealing an exact relationship between network width and recovery performance.
📝 Abstract
We study the high-dimensional training dynamics of a shallow neural network with quadratic activation in a teacher-student setup. We focus on the extensive-width regime, where the teacher and student network widths scale proportionally with the input dimension, and the sample size grows quadratically. This scaling aims to describe overparameterized neural networks in which feature learning still plays a central role. In the high-dimensional limit, we derive a dynamical characterization of the gradient flow, in the spirit of dynamical mean-field theory (DMFT). Under l2-regularization, we analyze these equations at long times and characterize the performance and spectral properties of the resulting estimator. This result provides a quantitative understanding of the effect of overparameterization on learning and generalization, and reveals a double descent phenomenon in the presence of label noise, where generalization improves beyond interpolation. In the small regularization limit, we obtain an exact expression for the perfect recovery threshold as a function of the network widths, providing a precise characterization of how overparameterization influences recovery.