🤖 AI Summary
This study addresses the fundamental question: “Can purely algorithmic improvements yield practical acceleration in neural network training?” To this end, we organized the inaugural AlgoPerf competition, establishing— for the first time—two rigorous evaluation paradigms: workload-agnostic assessment and hyperparameter-free benchmarking, with end-to-end training time on identical hardware as the sole primary metric. Methodologically, we developed a multi-task benchmarking framework integrating Distributed Shampoo (a non-diagonal preconditioner) and Schedule-Free AdamW (a hyperparameter-free optimizer), complemented by standardized temporal measurement protocols and fairness-preserving engineering safeguards. Results show that Distributed Shampoo achieved top performance in the hyperparameter-tuned track, while Schedule-Free AdamW led in the hyperparameter-free track. Top-performing methods demonstrated consistent speedups across diverse CV and NLP tasks, empirically validating that high-quality algorithmic design delivers substantial and robust training acceleration.
📝 Abstract
The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by improving the underlying training algorithms. In the external tuning ruleset, submissions must provide workload-agnostic hyperparameter search spaces, while in the self-tuning ruleset they must be completely hyperparameter-free. In both rulesets, submissions are compared on time-to-result across multiple deep learning workloads, training on fixed hardware. This paper presents the inaugural AlgoPerf competition's results, which drew 18 diverse submissions from 10 teams. Our investigation reveals several key findings: (1) The winning submission in the external tuning ruleset, using Distributed Shampoo, demonstrates the effectiveness of non-diagonal preconditioning over popular methods like Adam, even when compared on wall-clock runtime. (2) The winning submission in the self-tuning ruleset, based on the Schedule Free AdamW algorithm, demonstrates a new level of effectiveness for completely hyperparameter-free training algorithms. (3) The top-scoring submissions were surprisingly robust to workload changes. We also discuss the engineering challenges encountered in ensuring a fair comparison between different training algorithms. These results highlight both the significant progress so far, and the considerable room for further improvements.