đ€ AI Summary
This work investigates the mechanism by which gradient descent escapes spurious local minima in high-dimensional, strongly nonconvex optimization, using phase retrieval as a canonical example. Methodologically, we employ high-dimensional statistical mechanical analysis, Hessian spectral theory, and dynamical modeling to characterize the evolution of the loss landscapeâs curvature. We identify a dynamic phase transition in the Hessian spectrum: during early iterations, local curvature provides effective descent directions toward high-quality solutions; in later stages, curvature degradation induces trapping near poor local minima. This mechanismâfirst revealing that finite-dimensional gradient descent can surpass the infinite-dimensional theoretical signal-to-noise ratio (SNR) thresholdâis both theoretically grounded and empirically validated. Numerical experiments confirm that a coarse, curvature-informed initializationâwithout requiring fine-tuned designâenables high-precision signal recovery at SNRs significantly below the infinite-dimensional threshold.
đ Abstract
We provide an analytical study of the evolution of the Hessian during gradient descent dynamics, and relate a transition in its spectral properties to the ability of finding good minima. We focus on the phase retrieval problem as a case study for complex loss landscapes. We first characterize the high-dimensional limit where both the number $M$ and the dimension $N$ of the data are going to infinity at fixed signal-to-noise ratio $alpha = M/N$. For small $alpha$, the Hessian is uninformative with respect to the signal. For $alpha$ larger than a critical value, the Hessian displays at short-times a downward direction pointing towards good minima. While descending, a transition in the spectrum takes place: the direction is lost and the system gets trapped in bad minima. Hence, the local landscape is benign and informative at first, before gradient descent brings the system into a uninformative maze. Through both theoretical analysis and numerical experiments, we show that this dynamical transition plays a crucial role for finite (even very large) $N$: it allows the system to recover the signal well before the algorithmic threshold corresponding to the $N
ightarrowinfty$ limit. Our analysis sheds light on this new mechanism that facilitates gradient descent dynamics in finite dimensions, and highlights the importance of a good initialization based on spectral properties for optimization in complex high-dimensional landscapes.