Adaptive Momentum and Nonlinear Damping for Neural Network Training

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing stability and convergence speed in momentum-based optimization methods when training large-scale neural networks over complex loss landscapes. We propose a continuous-time dynamical systems framework that incorporates cubic nonlinear damping—inspired by structural dynamics—together with parameter-wise adaptive momentum and kinetic energy feedback control. This mechanism dynamically responds to local curvature, enabling an effective trade-off between stability and rapid convergence. Building upon this framework, we develop enhanced variants of momentum SGD (mSGD) and Adam equipped with cubic damping. Empirical evaluations on Vision Transformers (ViT), BERT, and GPT-2 demonstrate that our methods match or surpass standard Adam in performance, while theoretical analysis establishes their exponential convergence guarantees.

Technology Category

Application Category

📝 Abstract
We propose a continuous-time scheme for large-scale optimization that introduces individual, adaptive momentum coefficients regulated by the kinetic energy of each model parameter. This approach automatically adjusts to local landscape curvature to maintain stability without sacrificing convergence speed. We demonstrate that our adaptive friction can be related to cubic damping, a suppression mechanism from structural dynamics. Furthermore, we introduce two specific optimization schemes by augmenting the continuous dynamics of mSGD and Adam with a cubic damping term. Empirically, our methods demonstrate robustness and match or outperform Adam on training ViT, BERT, and GPT2 tasks where mSGD typically struggles. We further provide theoretical results establishing the exponential convergence of the proposed schemes.
Problem

Research questions and friction points this paper is trying to address.

adaptive momentum
nonlinear damping
neural network training
optimization stability
convergence speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive momentum
cubic damping
continuous-time optimization
exponential convergence
neural network training
🔎 Similar Papers
No similar papers found.
A
Aikaterini Karoni
School of Mathematics, University of Bristol
R
Rajit Rajpal
School of Mathematics, University of Edinburgh
Benedict Leimkuhler
Benedict Leimkuhler
University of Edinburgh
numerical analysiscomputational statisticshigh performance scientific computingmolecular
Gabriel Stoltz
Gabriel Stoltz
CERMICS, Ecole des Ponts