Rod Flow: A Continuous-Time Model for Gradient Descent at the Edge of Stability

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates the dynamical behavior of large-step gradient descent near the stability boundary in non-convex optimization, with a focus on divergence and self-stabilization phenomena. To this end, the authors propose Rod Flow—an ordinary differential equation (ODE) model that interprets gradient descent iterations as a one-dimensional “rod”-like physical system—yielding, for the first time, an explicit and computationally efficient dynamical approximation derived from a physical perspective. The model accurately predicts the critical sharpness threshold and uncovers the self-stabilization mechanism in quartic potentials. Combined theoretical analysis and numerical experiments demonstrate that Rod Flow achieves high accuracy on both toy models and standard neural networks, matching the performance of Central Flow while requiring significantly lower computational cost.

Technology Category

Application Category

📝 Abstract

How can we understand gradient-based training over non-convex landscapes? The edge of stability phenomenon, introduced in Cohen et al. (2021), indicates that the answer is not so simple: namely, gradient descent (GD) with large step sizes often diverges away from the gradient flow. In this regime, the"Central Flow", recently proposed in Cohen et al. (2025), provides an accurate ODE approximation to the GD dynamics over many architectures. In this work, we propose Rod Flow, an alternative ODE approximation, which carries the following advantages: (1) it rests on a principled derivation stemming from a physical picture of GD iterates as an extended one-dimensional object -- a"rod"; (2) it better captures GD dynamics for simple toy examples and matches the accuracy of Central Flow for representative neural network architectures, and (3) is explicit and cheap to compute. Theoretically, we prove that Rod Flow correctly predicts the critical sharpness threshold and explains self-stabilization in quartic potentials. We validate our theory with a range of numerical experiments.

Problem

Research questions and friction points this paper is trying to address.

edge of stability

gradient descent

non-convex optimization

ODE approximation

sharpness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rod Flow

gradient descent

edge of stability

ODE approximation

self-stabilization

🔎 Similar Papers

No similar papers found.

Authors to Follow