Architecture-Aware Learning Curve Extrapolation via Graph Ordinary Differential Equation

📅 2024-12-20

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing learning curve prediction methods typically treat training dynamics as isolated time series, neglecting the influence of neural architecture on loss landscape evolution—thereby limiting predictive accuracy. This paper proposes an architecture-aware continuous learning curve modeling framework: for the first time, it incorporates neural architecture encodings into a Neural Ordinary Differential Equation (Neural ODE) dynamical system, jointly leveraging graph neural networks and variational inference to enable architecture-driven loss trajectory modeling, early-stage extrapolation, and uncertainty quantification. Evaluated on MLP and CNN tasks, the method significantly outperforms state-of-the-art approaches; in neural architecture search (NAS), it improves training configuration ranking accuracy, accelerating hyperparameter optimization and architecture discovery. The core contribution lies in a unified architecture-dynamics co-modeling paradigm, overcoming fundamental limitations of conventional static or discrete modeling strategies.

Technology Category

Application Category

📝 Abstract

Learning curve extrapolation predicts neural network performance from early training epochs and has been applied to accelerate AutoML, facilitating hyperparameter tuning and neural architecture search. However, existing methods typically model the evolution of learning curves in isolation, neglecting the impact of neural network (NN) architectures, which influence the loss landscape and learning trajectories. In this work, we explore whether incorporating neural network architecture improves learning curve modeling and how to effectively integrate this architectural information. Motivated by the dynamical system view of optimization, we propose a novel architecture-aware neural differential equation model to forecast learning curves continuously. We empirically demonstrate its ability to capture the general trend of fluctuating learning curves while quantifying uncertainty through variational parameters. Our model outperforms current state-of-the-art learning curve extrapolation methods and pure time-series modeling approaches for both MLP and CNN-based learning curves. Additionally, we explore the applicability of our method in Neural Architecture Search scenarios, such as training configuration ranking.

Problem

Research questions and friction points this paper is trying to address.

Learning Curve Prediction

Automated Machine Learning

Neural Network Architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Predictive Model

Neural Network Learning Curves

Uncertainty Quantification

🔎 Similar Papers

No similar papers found.

Nvidia

184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5

US, CA, Santa Clara / US, WA, Redmond

Authors to Follow