🤖 AI Summary
Existing fixed-kernel theories—such as the Neural Tangent Kernel (NTK)—fail to capture the adaptive, dynamic feature learning underlying neural network generalization.
Method: We propose an overparameterized Gaussian sequence model that admits closed-form characterization of feature evolution during training. Building upon this, we develop a statistical analysis framework that transcends the NTK regime.
Contribution/Results: Our theory rigorously establishes quantitative links among feature evolution rate, representational capacity, and generalization error—without requiring infinite-width limits. It applies to practically sized architectures beyond the asymptotic “wide-network” regime, providing the first tractable theoretical prototype for adaptive feature learning in deep neural networks. By shifting focus from static kernel-based representations to dynamically evolving features, our work advances representation learning theory from the “fixed-kernel paradigm” toward a “dynamic-feature paradigm.”
📝 Abstract
A primary advantage of neural networks lies in their feature learning characteristics, which is challenging to theoretically analyze due to the complexity of their training dynamics. We propose a new paradigm for studying feature learning and the resulting benefits in generalizability. After reviewing the neural tangent kernel (NTK) theory and recent results in kernel regression, which address the generalization issue of sufficiently wide neural networks, we examine limitations and implications of the fixed kernel theory (as the NTK theory) and review recent theoretical advancements in feature learning. Moving beyond the fixed kernel/feature theory, we consider neural networks as adaptive feature models. Finally, we propose an over-parameterized Gaussian sequence model as a prototype model to study the feature learning characteristics of neural networks.