🤖 AI Summary
To address the inefficiency and poor generalizability of manual hyperparameter tuning—particularly for learning rates—this paper proposes a dynamic online meta-optimization framework that formulates learning rate adaptation as a discounted cumulative regret minimization problem over time. The method employs a gradient-based meta-update mechanism, enabling plug-and-play integration with any first-order optimizer (e.g., SGD, Adam) to achieve decoupled, real-time, adaptive step-size optimization. Key contributions include: (i) the first formalization of meta-optimization as discounted regret minimization; and (ii) a low-complexity variant that preserves theoretical rigor while ensuring computational efficiency and strong generalization. Experiments across diverse tasks demonstrate faster convergence, enhanced robustness to initialization and task heterogeneity, competitive performance against hand-tuned optimal schedulers, and significantly lower computational overhead compared to conventional hyperparameter search methods.
📝 Abstract
This paper addresses the challenge of optimizing meta-parameters (i.e., hyperparameters) in machine learning algorithms, a critical factor influencing training efficiency and model performance. Moving away from the computationally expensive traditional meta-parameter search methods, we introduce MetaOptimize framework that dynamically adjusts meta-parameters, particularly step sizes (also known as learning rates), during training. More specifically, MetaOptimize can wrap around any first-order optimization algorithm, tuning step sizes on the fly to minimize a specific form of regret that accounts for long-term effect of step sizes on training, through a discounted sum of future losses. We also introduce low complexity variants of MetaOptimize that, in conjunction with its adaptability to multiple optimization algorithms, demonstrate performance competitive to those of best hand-crafted learning rate schedules across various machine learning applications.