🤖 AI Summary
This work investigates the directional characteristics of the first escape from the origin saddle point under gradient descent in deep ReLU networks with small-weight initialization. Methodologically, it integrates local saddle-point dynamics analysis, precise characterization of the Hessian’s second-order structure, matrix singular-value perturbation theory, and nonlinear parameter manifold modeling. Theoretically, it rigorously establishes that the weight matrix induced by the first escape direction exhibits a strong low-rank bias, which intensifies with network depth ℓ; more critically, it uncovers, for the first time, an Ω(√ℓ) singular-value gap in the optimal escape direction. This result furnishes the first theoretical foundation for the “saddle-to-saddle” dynamical evolution—where bottleneck rank progressively increases layerwise—thereby elucidating the structured evolution mechanism of deep optimization trajectories and yielding testable predictions on rank dynamics.
📝 Abstract
When a deep ReLU network is initialized with small weights, GD is at first dominated by the saddle at the origin in parameter space. We study the so-called escape directions, which play a similar role as the eigenvectors of the Hessian for strict saddles. We show that the optimal escape direction features a low-rank bias in its deeper layers: the first singular value of the $ell$-th layer weight matrix is at least $ell^{frac{1}{4}}$ larger than any other singular value. We also prove a number of related results about these escape directions. We argue that this result is a first step in proving Saddle-to-Saddle dynamics in deep ReLU networks, where GD visits a sequence of saddles with increasing bottleneck rank.