🤖 AI Summary
This work addresses the challenge that standard gradient descent struggles to align with true signals under strongly anisotropic inputs, as it amplifies high-variance, uninformative directions. Through dynamical systems analysis, the authors demonstrate that spectral gradient descent—exemplified by optimizers such as Muon—effectively mitigates alignment bias induced by anisotropy by preserving gradient direction while discarding magnitude information. Under a spiked covariance model with anisotropic Gaussian inputs and leveraging an equivalence framework of two-layer neural networks, both theoretical analysis and empirical experiments show that spectral gradient descent substantially enhances alignment stability and convergence speed, overcoming key limitations of conventional methods in noise suppression and signal recovery.
📝 Abstract
Spectral gradient methods, such as the Muon optimizer, modify gradient updates by preserving directional information while discarding scale, and have shown strong empirical performance in deep learning. We investigate the mechanisms underlying these gains through a dynamical analysis of a nonlinear phase retrieval model with anisotropic Gaussian inputs, equivalent to training a two-layer neural network with the quadratic activation and fixed second-layer weights. Focusing on a spiked covariance setting where the dominant variance direction is orthogonal to the signal, we show that gradient descent (GD) suffers from a variance-induced misalignment: during the early escaping stage, the high-variance but uninformative spike direction is multiplicatively amplified, degrading alignment with the true signal under strong anisotropy. In contrast, spectral gradient descent (SpecGD) removes this spike amplification effect, leading to stable alignment and accelerated noise contraction. Numerical experiments confirm the theory and show that these phenomena persist under broader anisotropic covariances.