🤖 AI Summary
This work investigates how neural network width governs training dynamics. For single-hidden-layer linear networks, we derive the first exact analytical solution of learning dynamics at arbitrary finite width, unifying the characterization of the two-phase evolution—kernel learning and feature learning—and establishing a complete phase diagram parameterized by width, layer-wise learning rates, and initialization scale. Methodologically, we integrate analytical dynamical systems analysis, phase-diagram modeling, and empirical validation on nonlinear networks. Crucially, we identify three novel mechanisms operative during the feature-learning phase: alignment learning, de-alignment learning, and rescaling learning—each transcending the conventional kernel-method paradigm. These theoretical insights are empirically reproduced in realistic deep networks, offering a new conceptual framework for understanding training dynamics and designing adaptive optimization algorithms. (138 words)
📝 Abstract
Understanding the dynamics of neural networks in different width regimes is crucial for improving their training and performance. We present an exact solution for the learning dynamics of a one-hidden-layer linear network, with one-dimensional data, across any finite width, uniquely exhibiting both kernel and feature learning phases. This study marks a technical advancement by enabling the analysis of the training trajectory from any initialization and a detailed phase diagram under varying common hyperparameters such as width, layer-wise learning rates, and scales of output and initialization. We identify three novel prototype mechanisms specific to the feature learning regime: (1) learning by alignment, (2) learning by disalignment, and (3) learning by rescaling, which contrast starkly with the dynamics observed in the kernel regime. Our theoretical findings are substantiated with empirical evidence showing that these mechanisms also manifest in deep nonlinear networks handling real-world tasks, enhancing our understanding of neural network training dynamics and guiding the design of more effective learning strategies.