Gradient Flow Matching for Learning Update Dynamics in Neural Network Training

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Training deep neural networks relies on iterative gradient-based optimization, incurring substantial computational overhead. This work models the training process as an optimizer-aware continuous dynamical system and introduces a conditional flow matching framework that explicitly incorporates structural priors of common optimizers—such as SGD, Adam, and RMSProp—into the vector field learning objective. By doing so, the method enables predictive estimation of convergence weights directly from partial training trajectories. Unlike black-box sequential models, our approach achieves cross-architecture and cross-initialization generalization of weight trajectory prediction—a first in the literature. Experiments demonstrate that our method matches Transformer-based predictors in convergence accuracy while significantly outperforming LSTM and other baselines. Crucially, it maintains strong generalization across diverse network architectures and initializations, and exhibits practical potential for accelerating real-world training.

Technology Category

Application Category

📝 Abstract

Training deep neural networks remains computationally intensive due to the itera2 tive nature of gradient-based optimization. We propose Gradient Flow Matching (GFM), a continuous-time modeling framework that treats neural network training as a dynamical system governed by learned optimizer-aware vector fields. By leveraging conditional flow matching, GFM captures the underlying update rules of optimizers such as SGD, Adam, and RMSprop, enabling smooth extrapolation of weight trajectories toward convergence. Unlike black-box sequence models, GFM incorporates structural knowledge of gradient-based updates into the learning objective, facilitating accurate forecasting of final weights from partial training sequences. Empirically, GFM achieves forecasting accuracy that is competitive with Transformer-based models and significantly outperforms LSTM and other classical baselines. Furthermore, GFM generalizes across neural architectures and initializations, providing a unified framework for studying optimization dynamics and accelerating convergence prediction.

Problem

Research questions and friction points this paper is trying to address.

Modeling neural network training as a dynamical system

Learning optimizer-aware vector fields for update rules

Forecasting weight trajectories and accelerating convergence prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

GFM models training as dynamical system

Leverages conditional flow matching

Incorporates gradient-based structural knowledge

🔎 Similar Papers

Does SGD really happen in tiny subspaces?