🤖 AI Summary
Learning rate scheduling lacks cross-task and cross-setting reliability in deep learning. Method: We systematically evaluate multi-fidelity hyperparameter optimization, fixed annealing schedules, gradient-statistics-based online schedulers, and hyperparameter-free methods across diverse tasks and model scales. Contribution/Results: Experiments reveal that while mainstream approaches perform well on specific tasks, their generalization degrades significantly with increasing model complexity—traditional hyperparameter optimization becomes markedly less effective. This work provides the first empirical evidence that algorithm selection is essential for robust learning rate control. We propose three novel directions for practical deployment: (i) differentiable lightweight schedulers, (ii) task-adaptive meta-learning frameworks, and (iii) architecture-aware fidelity-efficiency trade-off strategies. Our findings establish both theoretical foundations and actionable pathways for developing robust, adaptive learning rate methods.
📝 Abstract
The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to online scheduling based on gradient statistics. This paper compares paradigms to assess the current state of learning rate control. We find that methods from multi-fidelity hyperparameter optimization, fixed-hyperparameter schedules, and hyperparameter-free learning often perform very well on selected deep learning tasks but are not reliable across settings. This highlights the need for algorithm selection methods in learning rate control, which have been neglected so far by both the AutoML and deep learning communities. We also observe a trend of hyperparameter optimization approaches becoming less effective as models and tasks grow in complexity, even when combined with multi-fidelity approaches for more expensive model trainings. A focus on more relevant test tasks and new promising directions like finetunable methods and meta-learning will enable the AutoML community to significantly strengthen its impact on this crucial factor in deep learning.