Eigenvalue distribution of the Neural Tangent Kernel in the quadratic scaling

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the asymptotic eigenvalue distribution of the Neural Tangent Kernel (NTK) for two-layer neural networks under the double-scaling regime $n/(dp) ogamma_1$ and $p/d ogamma_2$, where $n$ is sample size, $d$ input dimension, and $p$ hidden width. Methodologically, it introduces free multiplicative convolution—previously unexploited in NTK spectral analysis—to rigorously characterize the limiting spectrum as the free multiplicative convolution of a Marchenko–Pastur distribution with a deterministic distribution determined by the activation function and parameter scaling structure. Leveraging tools from random matrix theory, free probability, and pseudo-Lipschitz function analysis, the authors derive the exact limiting spectral distribution under i.i.d. weight assumptions. The contribution is twofold: first, it provides the first complete asymptotic characterization of the NTK spectrum in this double-scaling regime; second, it significantly extends the applicability of free probability to deep learning theory, establishing new mathematical foundations for analyzing training dynamics and generalization in wide neural networks.

Technology Category

Application Category

📝 Abstract
We compute the asymptotic eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension. Namely, if $Xinmathbb{R}^{n imes d}$ is an i.i.d random matrix, $Winmathbb{R}^{d imes p}$ is an i.i.d $mathcal{N}(0,1)$ matrix and $Dinmathbb{R}^{p imes p}$ is a diagonal matrix with i.i.d bounded entries, we consider the matrix [ mathrm{NTK} = frac{1}{d}XX^ op odot frac{1}{p} σ'left( frac{1}{sqrt{d}}XW ight)D^2 σ'left( frac{1}{sqrt{d}}XW ight)^ op ] where $σ'$ is a pseudo-Lipschitz function applied entrywise and under the scaling $frac{n}{dp} o γ_1$ and $frac{p}{d} o γ_2$. We describe the asymptotic distribution as the free multiplicative convolution of the Marchenko--Pastur distribution with a deterministic distribution depending on $σ$ and $D$.
Problem

Research questions and friction points this paper is trying to address.

Analyzing eigenvalue distribution of neural tangent kernel
Studying two-layer networks under quadratic scaling
Deriving asymptotic distribution via free convolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymptotic eigenvalue distribution analysis
Free multiplicative convolution technique
Marchenko-Pastur distribution modification
🔎 Similar Papers
No similar papers found.