Depth-induced NTK: Bridging Over-parameterized Neural Networks and Deep Neural Kernels

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Existing NTK theory primarily applies to infinite-width networks, neglecting the impact of depth on representation learning. This work systematically incorporates network depth as an explicit variable, proposing the Depth-driven Neural Tangent Kernel (D-NTK): finite-depth networks are mapped to Gaussian processes via skip connections, ensuring kernel convergence as depth tends to infinity. Theoretically, we characterize D-NTK’s dynamic stability and spectral properties, proving its training invariance and ability to suppress feature collapse. By integrating functional-space analysis with spectral decomposition, we establish a rigorous theoretical linkage among depth, kernel behavior, and generalization. Empirically, D-NTK consistently outperforms standard NTK on image classification and regression tasks, demonstrating enhanced expressive power and generalization performance—thereby bridging the theoretical gap between infinite-width and finite-depth regimes.

Technology Category

Application Category

📝 Abstract

While deep learning has achieved remarkable success across a wide range of applications, its theoretical understanding of representation learning remains limited. Deep neural kernels provide a principled framework to interpret over-parameterized neural networks by mapping hierarchical feature transformations into kernel spaces, thereby combining the expressive power of deep architectures with the analytical tractability of kernel methods. Recent advances, particularly neural tangent kernels (NTKs) derived by gradient inner products, have established connections between infinitely wide neural networks and nonparametric Bayesian inference. However, the existing NTK paradigm has been predominantly confined to the infinite-width regime, while overlooking the representational role of network depth. To address this gap, we propose a depth-induced NTK kernel based on a shortcut-related architecture, which converges to a Gaussian process as the network depth approaches infinity. We theoretically analyze the training invariance and spectrum properties of the proposed kernel, which stabilizes the kernel dynamics and mitigates degeneration. Experimental results further underscore the effectiveness of our proposed method. Our findings significantly extend the existing landscape of the neural kernel theory and provide an in-depth understanding of deep learning and the scaling law.

Problem

Research questions and friction points this paper is trying to address.

Bridging over-parameterized neural networks with deep neural kernels theoretically

Extending NTK beyond infinite-width to incorporate network depth effects

Analyzing training invariance and spectrum to stabilize kernel dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Depth-induced NTK kernel with shortcut architecture

Converges to Gaussian process at infinite depth

Stabilizes kernel dynamics and mitigates degeneration

🔎 Similar Papers

LaCoOT: Layer Collapse through Optimal Transport