🤖 AI Summary
Existing learning curve prediction methods typically treat training dynamics as isolated time series, neglecting the influence of neural architecture on loss landscape evolution—thereby limiting predictive accuracy. This paper proposes an architecture-aware continuous learning curve modeling framework: for the first time, it incorporates neural architecture encodings into a Neural Ordinary Differential Equation (Neural ODE) dynamical system, jointly leveraging graph neural networks and variational inference to enable architecture-driven loss trajectory modeling, early-stage extrapolation, and uncertainty quantification. Evaluated on MLP and CNN tasks, the method significantly outperforms state-of-the-art approaches; in neural architecture search (NAS), it improves training configuration ranking accuracy, accelerating hyperparameter optimization and architecture discovery. The core contribution lies in a unified architecture-dynamics co-modeling paradigm, overcoming fundamental limitations of conventional static or discrete modeling strategies.
📝 Abstract
Learning curve extrapolation predicts neural network performance from early training epochs and has been applied to accelerate AutoML, facilitating hyperparameter tuning and neural architecture search. However, existing methods typically model the evolution of learning curves in isolation, neglecting the impact of neural network (NN) architectures, which influence the loss landscape and learning trajectories. In this work, we explore whether incorporating neural network architecture improves learning curve modeling and how to effectively integrate this architectural information. Motivated by the dynamical system view of optimization, we propose a novel architecture-aware neural differential equation model to forecast learning curves continuously. We empirically demonstrate its ability to capture the general trend of fluctuating learning curves while quantifying uncertainty through variational parameters. Our model outperforms current state-of-the-art learning curve extrapolation methods and pure time-series modeling approaches for both MLP and CNN-based learning curves. Additionally, we explore the applicability of our method in Neural Architecture Search scenarios, such as training configuration ranking.