MONET: Modeling and Optimization of neural NEtwork Training from Edge to Data Centers

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing hardware-software co-design tools struggle to accurately model memory consumption and backward-pass complexity in neural network training. This work proposes the first extension of the experimentally validated inference modeling framework, Stream, to the training domain, introducing a comprehensive framework for modeling and optimizing training on heterogeneous dataflow accelerators. The framework supports training workflow modeling, exploration of layer fusion configurations, and optimization of activation checkpointing strategies. Integrated with a genetic algorithm for hardware architecture search, it is validated on ResNet-18 and a small-scale GPT-2 model, effectively uncovering critical trade-offs between performance and memory in training-specific hardware design and identifying superior architectures and training strategies.

Technology Category

Application Category

📝 Abstract

While hardware-software co-design has significantly improved the efficiency of neural network inference, modeling the training phase remains a critical yet underexplored challenge. Training workloads impose distinct constraints, particularly regarding memory footprint and backpropagation complexity, which existing inference-focused tools fail to capture. This paper introduces MONET, a framework designed to model the training of neural networks on heterogeneous dataflow accelerators. MONET builds upon Stream, an experimentally verified framework that that models the inference of neural networks on heterogeneous dataflow accelerators with layer fusion. Using MONET, we explore the design space of ResNet-18 and a small GPT-2, demonstrating the framework's capability to model training workflows and find better hardware architectures. We then further examine problems that become more complex in neural network training due to the larger design space, such as determining the best layer-fusion configuration. Additionally, we use our framework to find interesting trade-offs in activation checkpointing, with the help of a genetic algorithm. Our findings highlight the importance of a holistic approach to hardware-software co-design for scalable and efficient deep learning deployment.

Problem

Research questions and friction points this paper is trying to address.

neural network training

hardware-software co-design

memory footprint

backpropagation complexity

heterogeneous accelerators

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-aware modeling

heterogeneous dataflow accelerators

layer fusion optimization