🤖 AI Summary
Existing continuous-time models struggle to accurately capture the dynamic behavior of momentum-based optimizers like Adam near the edge of stability. This work extends the rod flow framework—originally developed for modeling optimization dynamics—to encompass eight momentum and adaptive optimizers, including Adam, RMSProp, NAdam, Heavy Ball, and Nesterov. By constructing a joint phase space comprising parameters and first-order momentum, and treating second-order momentum as a smoothed auxiliary variable, the proposed approach yields a unified continuous approximation of discrete optimizer trajectories. Empirical evaluations on standard machine learning architectures demonstrate that this method significantly outperforms conventional stable flow models in tracking optimizer trajectories within the critical edge-of-stability regime.
📝 Abstract
Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to momentum methods remains underdeveloped. In the gradient descent setting, Regis et al. (arXiv:2602.01480) introduced rod flow, which models consecutive iterates as an extended one-dimensional object -- a "rod." Here we extend rod flow to Adam by working in the joint phase space of parameters and first moment $(w, m)$ and treating the second moment $ν$ as a smooth auxiliary variable. We also develop rod flows for heavy ball momentum, Nesterov momentum, and scalar and per-component versions of RMSProp, Adam, and NAdam. For all eight optimizers, we empirically evaluate rod flow on representative machine learning architectures, where it tracks the discrete iterates through the edge-of-stability regime significantly more accurately than the corresponding stable flow.