The Origin of Edge of Stability

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work uncovers the stability-boundary mechanism in full-batch gradient descent training of neural networks, wherein the largest Hessian eigenvalue is driven precisely toward $2/\eta$ (with $\eta$ denoting the learning rate). By introducing a novel “edge coupling” functional, the authors establish an exact relationship between consecutive iterates and derive step-size recurrence relations and loss evolution formulas with explicit stability boundaries. The study provides the first gapless theoretical explanation of the Edge of Stability phenomenon: it rigorously proves that the maximal Hessian eigenvalue converges exactly to $2/\eta$ without approximation error, clarifies the directional conditions under which period-two orbits emerge, and reveals their connection to the critical learning rate, thereby offering a unified characterization of trajectory dynamics.

Technology Category

Application Category

📝 Abstract

Full-batch gradient descent on neural networks drives the largest Hessian eigenvalue to the threshold $2/η$, where $η$ is the learning rate. This phenomenon, the Edge of Stability, has resisted a unified explanation: existing accounts establish self-regulation near the edge but do not explain why the trajectory is forced toward $2/η$ from arbitrary initialization. We introduce the edge coupling, a functional on consecutive iterate pairs whose coefficient is uniquely fixed by the gradient-descent update. Differencing its criticality condition yields a step recurrence with stability boundary $2/η$, and a second-order expansion yields a loss-change formula whose telescoping sum forces curvature toward $2/η$. The two formulas involve different Hessian averages, but the mean value theorem localizes each to the true Hessian at an interior point of the step segment, yielding exact forcing of the Hessian eigenvalue with no gap. Setting both gradients of the edge coupling to zero classifies fixed points and period-two orbits; near a fixed point, the problem reduces to a function of the half-amplitude alone, which determines which directions support period-two orbits and on which side of the critical learning rate they appear.

Problem

Research questions and friction points this paper is trying to address.

Edge of Stability

Hessian eigenvalue

gradient descent

neural networks

learning rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge of Stability

edge coupling

Hessian eigenvalue