🤖 AI Summary
Existing NTK theory primarily applies to infinite-width networks, neglecting the impact of depth on representation learning. This work systematically incorporates network depth as an explicit variable, proposing the Depth-driven Neural Tangent Kernel (D-NTK): finite-depth networks are mapped to Gaussian processes via skip connections, ensuring kernel convergence as depth tends to infinity. Theoretically, we characterize D-NTK’s dynamic stability and spectral properties, proving its training invariance and ability to suppress feature collapse. By integrating functional-space analysis with spectral decomposition, we establish a rigorous theoretical linkage among depth, kernel behavior, and generalization. Empirically, D-NTK consistently outperforms standard NTK on image classification and regression tasks, demonstrating enhanced expressive power and generalization performance—thereby bridging the theoretical gap between infinite-width and finite-depth regimes.
📝 Abstract
While deep learning has achieved remarkable success across a wide range of applications, its theoretical understanding of representation learning remains limited. Deep neural kernels provide a principled framework to interpret over-parameterized neural networks by mapping hierarchical feature transformations into kernel spaces, thereby combining the expressive power of deep architectures with the analytical tractability of kernel methods. Recent advances, particularly neural tangent kernels (NTKs) derived by gradient inner products, have established connections between infinitely wide neural networks and nonparametric Bayesian inference. However, the existing NTK paradigm has been predominantly confined to the infinite-width regime, while overlooking the representational role of network depth. To address this gap, we propose a depth-induced NTK kernel based on a shortcut-related architecture, which converges to a Gaussian process as the network depth approaches infinity. We theoretically analyze the training invariance and spectrum properties of the proposed kernel, which stabilizes the kernel dynamics and mitigates degeneration. Experimental results further underscore the effectiveness of our proposed method. Our findings significantly extend the existing landscape of the neural kernel theory and provide an in-depth understanding of deep learning and the scaling law.