Training neural networks faster with minimal tuning using pre-computed lists of hyperparameters for NAdamW

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Hyperparameter tuning for the NAdamW optimizer in deep learning is costly and exhibits poor generalization across tasks. Method: We propose an off-the-shelf, precomputed hyperparameter list covering learning rate, weight decay, label smoothing, and dropout—derived from large-scale statistical analysis on the AlgoPerf real-world training benchmark. This approach yields a task-agnostic, empirically robust hyperparameter set requiring ≤5 trials for effective tuning. Contribution/Results: Our method significantly outperforms manual grid search and commercial Bayesian optimization tools in efficiency and budget constraints (≤5 trials), while maintaining strong robustness on held-out tasks not involved in the construction of the hyperparameter set. To our knowledge, this is the first empirically validated, plug-and-play hyperparameter configuration specifically designed for NAdamW, enabling low-budget, high-efficiency, and broadly generalizable neural network training.

Technology Category

Application Category

📝 Abstract
If we want to train a neural network using any of the most popular optimization algorithms, we are immediately faced with a dilemma: how to set the various optimization and regularization hyperparameters? When computational resources are abundant, there are a variety of methods for finding good hyperparameter settings, but when resources are limited the only realistic choices are using standard default values of uncertain quality and provenance, or tuning only a couple of the most important hyperparameters via extremely limited handdesigned sweeps. Extending the idea of default settings to a modest tuning budget, Metz et al. (2020) proposed using ordered lists of well-performing hyperparameter settings, derived from a broad hyperparameter search on a large library of training workloads. However, to date, no practical and performant hyperparameter lists that generalize to representative deep learning workloads have been demonstrated. In this paper, we present hyperparameter lists for NAdamW derived from extensive experiments on the realistic workloads in the AlgoPerf: Training Algorithms benchmark. Our hyperparameter lists also include values for basic regularization techniques (i.e. weight decay, label smoothing, and dropout). In particular, our best NAdamW hyperparameter list performs well on AlgoPerf held-out workloads not used to construct it, and represents a compelling turn-key approach to tuning when restricted to five or fewer trials. It also outperforms basic learning rate/weight decay sweeps and an off-the-shelf Bayesian optimization tool when restricted to the same budget.
Problem

Research questions and friction points this paper is trying to address.

Optimizing hyperparameters for neural network training efficiently.
Providing pre-computed hyperparameter lists for NAdamW optimization.
Enhancing performance with limited tuning trials in deep learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-computed hyperparameter lists for NAdamW
Extensive experiments on AlgoPerf benchmark
Outperforms basic sweeps and Bayesian optimization
🔎 Similar Papers
No similar papers found.