From Zero to Hero: How local curvature at artless initial conditions leads away from bad minima

📅 2024-03-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work investigates the mechanism by which gradient descent escapes spurious local minima in high-dimensional, strongly nonconvex optimization, using phase retrieval as a canonical example. Methodologically, we employ high-dimensional statistical mechanical analysis, Hessian spectral theory, and dynamical modeling to characterize the evolution of the loss landscape’s curvature. We identify a dynamic phase transition in the Hessian spectrum: during early iterations, local curvature provides effective descent directions toward high-quality solutions; in later stages, curvature degradation induces trapping near poor local minima. This mechanism—first revealing that finite-dimensional gradient descent can surpass the infinite-dimensional theoretical signal-to-noise ratio (SNR) threshold—is both theoretically grounded and empirically validated. Numerical experiments confirm that a coarse, curvature-informed initialization—without requiring fine-tuned design—enables high-precision signal recovery at SNRs significantly below the infinite-dimensional threshold.

Technology Category

Application Category

📝 Abstract

We provide an analytical study of the evolution of the Hessian during gradient descent dynamics, and relate a transition in its spectral properties to the ability of finding good minima. We focus on the phase retrieval problem as a case study for complex loss landscapes. We first characterize the high-dimensional limit where both the number $M$ and the dimension $N$ of the data are going to infinity at fixed signal-to-noise ratio $alpha = M/N$. For small $alpha$, the Hessian is uninformative with respect to the signal. For $alpha$ larger than a critical value, the Hessian displays at short-times a downward direction pointing towards good minima. While descending, a transition in the spectrum takes place: the direction is lost and the system gets trapped in bad minima. Hence, the local landscape is benign and informative at first, before gradient descent brings the system into a uninformative maze. Through both theoretical analysis and numerical experiments, we show that this dynamical transition plays a crucial role for finite (even very large) $N$: it allows the system to recover the signal well before the algorithmic threshold corresponding to the $N ightarrowinfty$ limit. Our analysis sheds light on this new mechanism that facilitates gradient descent dynamics in finite dimensions, and highlights the importance of a good initialization based on spectral properties for optimization in complex high-dimensional landscapes.

Problem

Research questions and friction points this paper is trying to address.

Understanding gradient descent in non-convex high-dimensional landscapes

Analyzing Hessian dynamics during optimization for phase retrieval

Exploring early-time dynamics' role in escaping poor minima

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing Hessian dynamics during gradient descent

Identifying BBP transition in Hessian spectrum

Leveraging early-time dynamics for signal recovery

🔎 Similar Papers

Geometry and Local Recovery of Global Minima of Two-layer Neural Networks at Overparameterization