🤖 AI Summary
This work investigates whether the power-law spectral structure of input data with α-power-law covariance is preserved after random linear projection followed by a degree-p monomial nonlinear activation. By integrating a head-tail spectral decomposition, Wick chaos expansion, and concentration inequalities from random matrix theory, the study establishes—for the first time—a rigorous proof that, under broad conditions, the output feature covariance spectrum inherits the input’s power-law exponent α, up to logarithmic corrections dependent on p. Specifically, the leading c₁d log⁻⁽ᵖ⁺¹⁾(d) eigenvalues decay as (log^{p−1}(j+1)/j)^α, while the remaining eigenvalues retain the j^{-α} decay rate. Matching upper and lower bounds demonstrate the robustness of spectral structure under such nonlinear transformations.
📝 Abstract
Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data $x \sim N(0,H)\in \mathbb{R}^v$ where $H$ has $α$-power-law spectrum ($λ_j(H ) \asymp j^{-α}$, $α> 1$), a Gaussian sketch matrix $W \in \mathbb{R}^{v\times d}$, and an entrywise monomial $f(y) = y^{p}$, we characterize the eigenvalues of the population random-feature covariance $\mathbb{E}_{x }[\frac{1}{d}f(W^\top x )^{\otimes 2}]$. We prove matching upper and lower bounds: for all $1 \leq j \leq c_1 d \log^{-(p+1)}(d)$, the $j$-th eigenvalue is of order $\left(\log^{p-1}(j+1)/j\right)^α$. For $ c_1 d \log^{-(p+1)}(d)\leq j\leq d$, the $j$-th eigenvalue is of order $j^{-α}$ up to a polylog factor. That is, the power-law exponent $α$ is inherited exactly from the input covariance, modified only by a logarithmic correction that depends on the monomial degree $p$. The proof combines a dyadic head-tail decomposition with Wick chaos expansions for higher-order monomials and random matrix concentration inequalities.