Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work investigates how network depth affects the limiting behavior of the Neural Tangent Kernel (NTK) in overparameterized fully connected ReLU networks. Methodologically, leveraging the NTK framework and random matrix theory, we rigorously prove that—under the wide-network limit—the normalized NTK degenerates to an all-ones matrix as depth increases, causing network outputs to converge deterministically on the unit sphere. This degeneration reflects a fundamental mechanism: deeper architectures exacerbate NTK isotropy, thereby impairing generalization. Empirically, we validate the critical depth scale at which degeneration emerges and identify universal structural conditions governing NTK degradation. Our study establishes, for the first time, a quantitative triadic relationship among depth, NTK degeneration, and output convergence. This provides a novel theoretical lens for understanding representational bottlenecks and generalization limits in deep neural networks.

Technology Category

Application Category

📝 Abstract

Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of infinitely large widths and small learning rate, the kernel that is obtained allows to represent the output of the learned model with a closed-form solution. This closed-form solution hinges on the invertibility of the limiting kernel, a property that often holds on real-world datasets. In this work, we analyze the sensitivity of large ReLU networks to increasing depths by characterizing the corresponding limiting kernel. Our theoretical results demonstrate that the normalized limiting kernel approaches the matrix of ones. In contrast, they show the corresponding closed-form solution approaches a fixed limit on the sphere. We empirically evaluate the order of magnitude in network depth required to observe this convergent behavior, and we describe the essential properties that enable the generalization of our results to other kernels.

Problem

Research questions and friction points this paper is trying to address.

Analyzing depth sensitivity in overparameterized neural networks

Characterizing limiting kernel behavior for large ReLU networks

Investigating closed-form solution convergence with increasing depth

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes depth sensitivity in large ReLU networks

Characterizes limiting kernel behavior for deep networks

Empirically determines depth threshold for convergence

🔎 Similar Papers

No similar papers found.