Training Neural Networks at Any Scale

📅 2025-11-14
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
To address the low training efficiency, poor generalization, and strong hyperparameter sensitivity of neural networks across varying scales, this paper proposes a scale-invariant adaptive optimization framework. The method unifies adaptive optimization, second-order information approximation, learning-rate scaling invariance, and gradient compression, thereby decoupling optimization from model size and hardware configuration. Its core innovation lies in a scale-robust update paradigm that ensures stable optimization dynamics under variations in parameter count, batch size, and device count. Extensive experiments across diverse architectures—including MLPs, CNNs, and Transformers—and benchmarks—including CIFAR-10/100, ImageNet, and WikiText—demonstrate that the framework achieves 1.3–2.1× speedup over baseline optimizers, improved convergence stability, significantly reduced hyperparameter sensitivity, and eliminates the need for scale-specific hyperparameter tuning.

Technology Category

Application Category

📝 Abstract
This article reviews modern optimization methods for training neural networks with an emphasis on efficiency and scale. We present state-of-the-art optimization algorithms under a unified algorithmic template that highlights the importance of adapting to the structures in the problem. We then cover how to make these algorithms agnostic to the scale of the problem. Our exposition is intended as an introduction for both practitioners and researchers who wish to be involved in these exciting new developments.
Problem

Research questions and friction points this paper is trying to address.

Optimizing neural network training for efficiency and scale
Developing algorithms adaptable to problem structures
Making optimization methods independent of problem size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified algorithmic template for optimization methods
Adapting algorithms to problem structures
Scale-agnostic neural network training techniques
🔎 Similar Papers
No similar papers found.
Thomas Pethick
Thomas Pethick
PhD, EPFL
Kimon Antonakopoulos
Kimon Antonakopoulos
LIONS-EPFL
Convex OptimizationContinuous OptimizationVariational Inequalities
A
A. Silveti-Falls
CVN, CentraleSupélec, Université Paris-Saclay, Inria
Leena Chennuru Vankadara
Leena Chennuru Vankadara
Gatsby Unit, UCL
Deep learning theoryCausal learning theoryHigh-dimensional statistics
V
V. Cevher
École Polytechnique FĂ©dĂ©rale de Lausanne (EPFL), Laboratory for Information and Inference Systems (LIONS)