🤖 AI Summary
This work addresses the limited theoretical understanding of how feature learning in neural networks reshapes their induced function space compared to fixed-kernel methods. By leveraging high-dimensional asymptotic analysis, gradient descent dynamics, and random matrix theory, the study reveals that training a two-layer network adaptively modifies its kernel through data-dependent adjustments of feature distributions. Specifically, under a high-dimensional proportional regime, the authors establish for the first time that feature learning is equivalent to introducing a target-dependent spiked Gaussian covariance structure, thereby transforming the distribution in either parameter or input space. This mechanism goes beyond mere rescaling of a fixed kernel; instead, it selectively amplifies feature directions aligned with the target by coupling radial modes with quadratic harmonics, enabling directional restructuring of the spectral geometry of the feature space and substantially enhancing the expressive capacity of the function space for signal representation.
📝 Abstract
Feature learning is widely regarded as the key mechanism distinguishing neural networks from fixed-kernel methods, yet its impact on the induced function space remains poorly understood. In this work, we precisely characterize how the function space spanned by the features of a two-layer neural network evolves during gradient descent training. We prove that, in the high-dimensional proportional regime, after a large gradient step the post-update feature distribution is well approximated by a target-dependent spiked Gaussian covariance. This induces a data-adaptive kernel that reshapes the function space and modifies its spectral structure. Our analysis reveals that feature learning can be interpreted as a distributional transformation in either parameter space or input space, equivalently as the introduction of a target-dependent kernel. In particular, it selectively amplifies eigenvalues aligned with the target direction and mixes leading eigenfunctions, coupling the top radial mode with a target-aligned quadratic harmonic. Overall, our results provide a precise function-space perspective on early-stage feature learning: rather than just rescaling a fixed kernel, gradient descent induces a data-adaptive deformation that preferentially enhances directions aligned with the signal in the data.