Optimizing ML Training with Metagradient Descent

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of efficiently optimizing high-dimensional configuration spaces in large-scale machine learning training, this paper proposes a scalable meta-gradient computation algorithm and the Smooth Model Training (SMT) framework—enabling, for the first time, end-to-end, differentiable joint optimization of training strategies. Methodologically, it integrates reverse-mode automatic differentiation through training loops, smooth modeling of training trajectories, and meta-gradient descent (MGD) to jointly optimize data selection, poisoning-resilient strategies, and learning rate scheduling. Key contributions are: (1) a breakthrough in scalable meta-gradient computation for large-scale training; and (2) the SMT framework, which ensures stability and convergence of MGD under realistic dynamic training conditions. Experiments demonstrate that the proposed data selection method significantly outperforms existing approaches; robustness against accuracy-degrading data poisoning attacks improves by an order of magnitude; and the fully automated learning rate scheduler matches or exceeds hand-crafted designs in performance.

Technology Category

Application Category

📝 Abstract
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a"smooth model training"framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.
Problem

Research questions and friction points this paper is trying to address.

Optimizing training setup for large-scale ML models
Efficiently calculating metagradients for model training
Improving dataset selection and learning rate schedules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient metagradient calculation at scale
Smooth model training framework optimization
Metagradient descent improves dataset selection
🔎 Similar Papers
No similar papers found.
Logan Engstrom
Logan Engstrom
MIT
Computer Science
Andrew Ilyas
Andrew Ilyas
Massachusetts Institute of Technology
Computer Science
B
Benjamin Chen
MIT
A
Axel Feldmann
MIT
W
William Moses
UIUC
A
Aleksander Madry
MIT