Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work investigates the directional characteristics of the first escape from the origin saddle point under gradient descent in deep ReLU networks with small-weight initialization. Methodologically, it integrates local saddle-point dynamics analysis, precise characterization of the Hessian’s second-order structure, matrix singular-value perturbation theory, and nonlinear parameter manifold modeling. Theoretically, it rigorously establishes that the weight matrix induced by the first escape direction exhibits a strong low-rank bias, which intensifies with network depth ℓ; more critically, it uncovers, for the first time, an Ω(√ℓ) singular-value gap in the optimal escape direction. This result furnishes the first theoretical foundation for the “saddle-to-saddle” dynamical evolution—where bottleneck rank progressively increases layerwise—thereby elucidating the structured evolution mechanism of deep optimization trajectories and yielding testable predictions on rank dynamics.

Technology Category

Application Category

📝 Abstract

When a deep ReLU network is initialized with small weights, GD is at first dominated by the saddle at the origin in parameter space. We study the so-called escape directions, which play a similar role as the eigenvectors of the Hessian for strict saddles. We show that the optimal escape direction features a low-rank bias in its deeper layers: the first singular value of the $ell$-th layer weight matrix is at least $ell^{frac{1}{4}}$ larger than any other singular value. We also prove a number of related results about these escape directions. We argue that this result is a first step in proving Saddle-to-Saddle dynamics in deep ReLU networks, where GD visits a sequence of saddles with increasing bottleneck rank.

Problem

Research questions and friction points this paper is trying to address.

Analyzes escape directions in deep ReLU networks during initial GD

Reveals low-rank bias in deeper layers' optimal escape directions

Investigates saddle-to-saddle dynamics with increasing bottleneck rank

Innovation

Methods, ideas, or system contributions that make the work stand out.

Studies escape directions in deep ReLU networks

Shows low-rank bias in optimal escape direction

Proves Saddle-to-Saddle dynamics with increasing rank

🔎 Similar Papers

When Are Bias-Free ReLU Networks Effectively Linear Networks?