🤖 AI Summary
This work investigates how network depth affects the limiting behavior of the Neural Tangent Kernel (NTK) in overparameterized fully connected ReLU networks. Methodologically, leveraging the NTK framework and random matrix theory, we rigorously prove that—under the wide-network limit—the normalized NTK degenerates to an all-ones matrix as depth increases, causing network outputs to converge deterministically on the unit sphere. This degeneration reflects a fundamental mechanism: deeper architectures exacerbate NTK isotropy, thereby impairing generalization. Empirically, we validate the critical depth scale at which degeneration emerges and identify universal structural conditions governing NTK degradation. Our study establishes, for the first time, a quantitative triadic relationship among depth, NTK degeneration, and output convergence. This provides a novel theoretical lens for understanding representational bottlenecks and generalization limits in deep neural networks.
📝 Abstract
Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of infinitely large widths and small learning rate, the kernel that is obtained allows to represent the output of the learned model with a closed-form solution. This closed-form solution hinges on the invertibility of the limiting kernel, a property that often holds on real-world datasets. In this work, we analyze the sensitivity of large ReLU networks to increasing depths by characterizing the corresponding limiting kernel. Our theoretical results demonstrate that the normalized limiting kernel approaches the matrix of ones. In contrast, they show the corresponding closed-form solution approaches a fixed limit on the sphere. We empirically evaluate the order of magnitude in network depth required to observe this convergent behavior, and we describe the essential properties that enable the generalization of our results to other kernels.