eagle: early approximated gradient based learning rate estimator

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address slow convergence and poor stability during the early stages of deep learning training, this paper proposes the EAGLE optimizer. Methodologically, EAGLE introduces two key innovations: (1) it estimates local loss curvature by jointly modeling gradient and parameter updates across consecutive steps—enabling curvature-aware, adaptive initialization of the learning rate; and (2) it implements a dynamic switching mechanism between Adam and EAGLE modes to balance rapid convergence and training robustness. The approach integrates gradient differencing, local curvature approximation, and adaptive scheduling within the standard Adam framework. Extensive experiments on multiple benchmark datasets demonstrate that EAGLE significantly accelerates training loss reduction, achieving comparable final accuracy with 30–50% fewer training epochs. Moreover, it consistently outperforms baseline optimizers—including SGD and Adam—in both convergence speed and stability.

Technology Category

Application Category

📝 Abstract

We propose EAGLE update rule, a novel optimization method that accelerates loss convergence during the early stages of training by leveraging both current and previous step parameter and gradient values. The update algorithm estimates optimal parameters by computing the changes in parameters and gradients between consecutive training steps and leveraging the local curvature of the loss landscape derived from these changes. However, this update rule has potential instability, and to address that, we introduce an adaptive switching mechanism that dynamically selects between Adam and EAGLE update rules to enhance training stability. Experiments on standard benchmark datasets demonstrate that EAGLE optimizer, which combines this novel update rule with the switching mechanism achieves rapid training loss convergence with fewer epochs, compared to conventional optimization methods.

Problem

Research questions and friction points this paper is trying to address.

Machine Learning

Convergence Speed

Training Stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

EAGLE

Dynamic Learning Rate Adjustment

Auto-Switching Strategy

🔎 Similar Papers

Gradient descent with generalized Newton's method