🤖 AI Summary
This work investigates the asymptotic eigenvalue distribution of the Neural Tangent Kernel (NTK) for two-layer neural networks under the double-scaling regime $n/(dp) ogamma_1$ and $p/d ogamma_2$, where $n$ is sample size, $d$ input dimension, and $p$ hidden width. Methodologically, it introduces free multiplicative convolution—previously unexploited in NTK spectral analysis—to rigorously characterize the limiting spectrum as the free multiplicative convolution of a Marchenko–Pastur distribution with a deterministic distribution determined by the activation function and parameter scaling structure. Leveraging tools from random matrix theory, free probability, and pseudo-Lipschitz function analysis, the authors derive the exact limiting spectral distribution under i.i.d. weight assumptions. The contribution is twofold: first, it provides the first complete asymptotic characterization of the NTK spectrum in this double-scaling regime; second, it significantly extends the applicability of free probability to deep learning theory, establishing new mathematical foundations for analyzing training dynamics and generalization in wide neural networks.
📝 Abstract
We compute the asymptotic eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension. Namely, if $Xinmathbb{R}^{n imes d}$ is an i.i.d random matrix, $Winmathbb{R}^{d imes p}$ is an i.i.d $mathcal{N}(0,1)$ matrix and $Dinmathbb{R}^{p imes p}$ is a diagonal matrix with i.i.d bounded entries, we consider the matrix
[
mathrm{NTK}
=
frac{1}{d}XX^ op
odot
frac{1}{p}
σ'left(
frac{1}{sqrt{d}}XW
ight)D^2
σ'left(
frac{1}{sqrt{d}}XW
ight)^ op
]
where $σ'$ is a pseudo-Lipschitz function applied entrywise and under the scaling $frac{n}{dp} o γ_1$ and $frac{p}{d} o γ_2$. We describe the asymptotic distribution as the free multiplicative convolution of the Marchenko--Pastur distribution with a deterministic distribution depending on $σ$ and $D$.