🤖 AI Summary
Hyperparameter tuning for the Adam optimizer is computationally expensive and exhibits poor generalization across tasks. Method: We propose a synergistic framework—Adam-PFN and CDF-augment—that jointly addresses these limitations. First, we introduce Adam-PFN, a pre-trained proxy model specifically designed for Adam hyperparameter optimization, leveraging transfer learning from TaskSet learning curves. Second, we design CDF-augment, a novel data augmentation strategy that models learning curve priors via cumulative distribution functions to enhance sample efficiency in freeze-thaw Bayesian optimization. Contribution/Results: Experiments demonstrate that our method significantly improves learning curve extrapolation accuracy and hyperparameter convergence speed under low evaluation budgets. It achieves superior robustness and generalization both in-distribution (on TaskSet) and out-of-distribution (on unseen tasks), establishing a new paradigm for efficient, transferable optimizer hyperparameter tuning.
📝 Abstract
The Adam optimizer remains one of the most widely used optimizers in deep learning, and effectively tuning its hyperparameters is key to optimizing performance. However, tuning can be tedious and costly. Freeze-thaw Bayesian Optimization (BO) is a recent promising approach for low-budget hyperparameter tuning, but is limited by generic surrogates without prior knowledge of how hyperparameters affect learning. We propose Adam-PFN, a new surrogate model for Freeze-thaw BO of Adam's hyperparameters, pre-trained on learning curves from TaskSet, together with a new learning curve augmentation method, CDF-augment, which artificially increases the number of available training examples. Our approach improves both learning curve extrapolation and accelerates hyperparameter optimization on TaskSet evaluation tasks, with strong performance on out-of-distribution (OOD) tasks.