Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Convergence

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing learning rate schedulers (e.g., cosine decay) rely on predefined annealing curves and lack real-time responsiveness to training dynamics. This work proposes GreedyLR—a gradient-sign- and magnitude-aware greedy adaptive scheduler that requires no hyperparameters, incurs zero additional computational overhead, and integrates seamlessly with mainstream deep learning frameworks. Its core contributions are threefold: (1) a novel loss-difference-driven dynamic step-size scaling mechanism; (2) theoretical convergence guarantees, along with derivation of the optimal scaling factor that maximizes convergence rate; and (3) built-in robustness to gradient noise. Extensive experiments across NLP, CV, and 7B-scale model pretraining and fine-tuning demonstrate that GreedyLR achieves an average 1.8× speedup in convergence and consistently surpasses state-of-the-art schedulers—including cosine annealing and linear decay—in final accuracy.

Technology Category

Application Category

📝 Abstract

Despite significant advances in optimizers for training, most research works use common scheduler choices like Cosine or exponential decay. In this paper, we study emph{GreedyLR}, a novel scheduler that adaptively adjusts the learning rate during training based on the current loss. To validate the effectiveness of our proposed scheduler, we conduct experiments on several NLP, CV, and LLM tasks with up to $7B$ parameters, including both fine-tuning and pre-training experiments. The results show that our approach outperforms several state-of-the-art schedulers in terms of accuracy, speed, and convergence. We also provide a theoretical analysis of the GreedyLR algorithm, including a proof of convergence and derivation of the optimal scaling factor $F$ that maximizes the convergence rate, along with experiments to show robustness of the algorithm to realistic noisy landscapes. Our scheduler is easy to implement, computationally efficient, and could be considered a good default scheduler for training.

Problem

Research questions and friction points this paper is trying to address.

Proposes GreedyLR scheduler for adaptive learning rate adjustment

Validates scheduler on NLP, CV, and LLM tasks up to 7B parameters

Demonstrates improved accuracy, speed, and convergence over existing schedulers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic learning rate scheduler based on loss changes

Adaptive adjustment during training for faster convergence

Theoretically analyzed with proven convergence and optimal scaling

🔎 Similar Papers

Spike No More: Stabilizing the Pre-training of Large Language Models