A Physics-Inspired Optimizer: Velocity Regularized Adam

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing adaptive optimizers (e.g., Adam) suffer from parameter update oscillations near stability boundaries, leading to slow convergence. To address this, we propose VRAdam—a novel optimizer that introduces, for the first time in deep learning, fourth-order kinetic regularization derived from velocity-based physical systems, enabling velocity-aware adaptive learning rate decay. VRAdam jointly incorporates per-parameter adaptive scaling, dynamic learning rate scheduling, and velocity regularization, ensuring broad compatibility across diverse architectures—including CNNs, Transformers, and GFlowNets. Extensive experiments demonstrate that VRAdam consistently outperforms AdamW across image classification, language modeling, and image generation tasks: it achieves faster convergence, superior generalization, and stable training with significantly larger base learning rates—effectively overcoming the oscillation bottleneck inherent to adaptive optimizers operating near marginal stability.

Technology Category

Application Category

📝 Abstract
We introduce Velocity-Regularized Adam (VRAdam), a physics-inspired optimizer for training deep neural networks that draws on ideas from quartic terms for kinetic energy with its stabilizing effects on various system dynamics. Previous algorithms, including the ubiquitous Adam, operate at the so called adaptive edge of stability regime during training leading to rapid oscillations and slowed convergence of loss. However, VRAdam adds a higher order penalty on the learning rate based on the velocity such that the algorithm automatically slows down whenever weight updates become large. In practice, we observe that the effective dynamic learning rate shrinks in high-velocity regimes, damping oscillations and allowing for a more aggressive base step size when necessary without divergence. By combining this velocity-based regularizer for global damping with per-parameter scaling of Adam to create a hybrid optimizer, we demonstrate that VRAdam consistently exceeds the performance against standard optimizers including AdamW. We benchmark various tasks such as image classification, language modeling, image generation and generative modeling using diverse architectures and training methodologies including Convolutional Neural Networks (CNNs), Transformers, and GFlowNets.
Problem

Research questions and friction points this paper is trying to address.

Addresses rapid oscillations in neural network training
Improves convergence by regulating learning rate velocity
Enhances optimizer performance across diverse deep learning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Velocity-Regularized Adam (VRAdam) optimizer
Higher order penalty on learning rate
Hybrid optimizer combining velocity-based regularizer
🔎 Similar Papers
No similar papers found.