Low Rank Gradients and Where to Find Them

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the origins and mechanisms underlying the low-rank structure of training loss gradients in two-layer neural networks under non-isotropic, ill-conditioned, and data-weight-dependent regimes. Departing from conventional isotropic assumptions, we adopt a spiked-data model and conduct theoretical analysis under both mean-field and neural tangent kernel (NTK) scaling paradigms. We rigorously establish that the gradient is well-approximated by two dominant rank-one components, whose emergence is jointly governed by data geometry, scaling protocol, and activation function properties. Furthermore, we demonstrate that common regularizers—including ℓ₂ penalty and weight decay—selectively suppress specific low-rank components. Our theoretical predictions are comprehensively validated on both synthetic and real-world datasets, confirming the ubiquity of gradient low-rankness and its controllability via targeted regularization strategies.

Technology Category

Application Category

📝 Abstract
This paper investigates low-rank structure in the gradients of the training loss for two-layer neural networks while relaxing the usual isotropy assumptions on the training data and parameters. We consider a spiked data model in which the bulk can be anisotropic and ill-conditioned, we do not require independent data and weight matrices and we also analyze both the mean-field and neural-tangent-kernel scalings. We show that the gradient with respect to the input weights is approximately low rank and is dominated by two rank-one terms: one aligned with the bulk data-residue , and another aligned with the rank one spike in the input data. We characterize how properties of the training data, the scaling regime and the activation function govern the balance between these two components. Additionally, we also demonstrate that standard regularizers, such as weight decay, input noise and Jacobian penalties, also selectively modulate these components. Experiments on synthetic and real data corroborate our theoretical predictions.
Problem

Research questions and friction points this paper is trying to address.

Investigates low-rank gradient structure in two-layer neural networks
Relaxes isotropy assumptions on training data and parameters
Characterizes how data properties affect gradient components balance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes low-rank gradients in neural networks
Identifies two dominant rank-one gradient components
Shows regularizers selectively modulate gradient components
🔎 Similar Papers
No similar papers found.