Benchmarking Neural Network Training Algorithms

📅 2023-06-12
🏛️ arXiv.org
📈 Citations: 33
Influential: 1
📄 PDF
🤖 AI Summary
Fair evaluation of deep learning training algorithms faces three key challenges: inconsistent termination criteria, high workload sensitivity, and difficulty isolating hyperparameter tuning. This paper introduces AlgoPerf—the first time-oriented, multi-workload training algorithm benchmark—featuring robustness-aware workload variant design and a standardized termination protocol, with hyperparameter tuning rigorously isolated. Evaluated on a unified hardware platform, AlgoPerf employs a diverse multi-task workload suite and a systematic optimizer comparison methodology to enable latency-accuracy co-evaluation across models, datasets, and hardware. Experiments reveal substantial latency disparities among mainstream optimizers, establish reproducible state-of-the-art baselines, and deliver the first quantitative, fair, and engineering-practical evaluation standard for training algorithm improvement.
📝 Abstract
Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a community, we are currently unable to reliably identify training algorithm improvements, or even determine the state-of-the-art training algorithm. In this work, using concrete experiments, we argue that real progress in speeding up training requires new benchmarks that resolve three basic challenges faced by empirical comparisons of training algorithms: (1) how to decide when training is complete and precisely measure training time, (2) how to handle the sensitivity of measurements to exact workload details, and (3) how to fairly compare algorithms that require hyperparameter tuning. In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark. Our benchmark includes a set of workload variants that make it possible to detect benchmark submissions that are more robust to workload changes than current widely-used methods. Finally, we evaluate baseline submissions constructed using various optimizers that represent current practice, as well as other optimizers that have recently received attention in the literature. These baseline results collectively demonstrate the feasibility of our benchmark, show that non-trivial gaps between methods exist, and set a provisional state-of-the-art for future benchmark submissions to try and surpass.
Problem

Research questions and friction points this paper is trying to address.

Identify state-of-the-art training algorithms reliably
Measure training time accurately and decide completion
Compare hyperparameter-tuned algorithms fairly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces competitive time-to-result benchmark
Handles sensitivity via workload variants
Fairly compares hyperparameter-tuned algorithms
🔎 Similar Papers
No similar papers found.
George E. Dahl
George E. Dahl
Google Inc.
Machine LearningComputer scienceArtificial IntelligenceDeep LearningAcoustic Modeling
Frank Schneider
Frank Schneider
Postdoctoral Researcher, University of Tübingen
Machine LearningDeep LearningOptimizationTraining MethodsArtificial Intelligence
Zachary Nado
Zachary Nado
Google Brain
Machine Learning
Naman Agarwal
Naman Agarwal
Senior Research Scientist, Google AI Princeton
Machine Learning AlgorithmsOptimizationControl
Chandramouli Shama Sastry
Chandramouli Shama Sastry
Amazon
Robust MLGenerative ModelsDeep LearningNLP
Philipp Hennig
Philipp Hennig
University of Tübingen
Probabilistic NumericsMachine LearningComputer Science
Sourabh Medapati
Sourabh Medapati
Google Deepmind
deep learningoptimization theory
Runa Eschenhagen
Runa Eschenhagen
PhD student, University of Cambridge
Machine Learning
Priya Kasimbeg
Priya Kasimbeg
Google DeepMind
Daniel Suo
Daniel Suo
Princeton University
Computer Vision
Juhan Bae
Juhan Bae
University of Toronto
Machine Learning
J
J. Gilmer
Google
A
A. L. Peirson
Stanford University
B
B. Khan
Google
Rohan Anil
Rohan Anil
Distinguished Engineer, Google DeepMind
M
Michael G. Rabbat
Meta AI (FAIR)
S
Shankar Krishnan
Google
D
Daniel Snider
Vector Institute, University of Toronto
Ehsan Amid
Ehsan Amid
Research Scientist at Google DeepMind
Machine LearningTempered Exponential MeasuresOnline LearningDimensionality Reduction
K
Kongtao Chen
Google
Chris J. Maddison
Chris J. Maddison
University of Toronto
Machine LearningRepresentation LearningBayesian InferenceOptimization
R
R. Vasudev
Dell Technologies
M
Michal Badura
Google
A
Ankush Garg
Google
Peter Mattson
Peter Mattson
Google