Growth strategies for arbitrary DAG neural architectures

📅 2025-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and environmental impact induced by ever-larger deep learning models, this paper proposes Neural Architecture Growth (NAG), the first method enabling online, dynamic expansion of arbitrary directed acyclic graph (DAG)-structured architectures. NAG identifies representational bottlenecks via backpropagation gradient sensitivity, enabling goal-directed structural growth. It integrates a parameter-efficiency-driven co-design of growth and pruning, jointly optimizing training and inference efficiency. On multiple benchmark tasks, NAG achieves comparable accuracy with significantly reduced training time (average reduction of 32%) and model size (average parameter reduction of 41%), while improving FLOPs-to-parameter efficiency. Moreover, it supports interpretable, progressive architectural evolution—enabling transparent, stepwise adaptation of network topology during training.

Technology Category

Application Category

📝 Abstract
Deep learning has shown impressive results obtained at the cost of training huge neural networks. However, the larger the architecture, the higher the computational, financial, and environmental costs during training and inference. We aim at reducing both training and inference durations. We focus on Neural Architecture Growth, which can increase the size of a small model when needed, directly during training using information from the backpropagation. We expand existing work and freely grow neural networks in the form of any Directed Acyclic Graph by reducing expressivity bottlenecks in the architecture. We explore strategies to reduce excessive computations and steer network growth toward more parameter-efficient architectures.
Problem

Research questions and friction points this paper is trying to address.

Neural Network Efficiency
Environmental Sustainability
Complex Network Structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Expansion
Backpropagation Optimization
Efficient Computation
🔎 Similar Papers
No similar papers found.
S
Stella Douka
TAU team, LISN, Université Paris-Saclay, CNRS, Inria, 91405 Orsay, France
M
Manon Verbockhaven
TAU team, LISN, Université Paris-Saclay, CNRS, Inria, 91405 Orsay, France
T
Théo Rudkiewicz
TAU team, LISN, Université Paris-Saclay, CNRS, Inria, 91405 Orsay, France
S
Stéphane Rivaud
TAU team, LISN, Université Paris-Saclay, CNRS, Inria, 91405 Orsay, France
F
Francois P. Landes
TAU team, LISN, Université Paris-Saclay, CNRS, Inria, 91405 Orsay, France
Sylvain Chevallier
Sylvain Chevallier
LISN - Université Paris-Saclay, France
open sciencefrugal learningtransfer learningRiemannian geometrybiosignals
Guillaume Charpiat
Guillaume Charpiat
INRIA (Saclay)
Artificial intelligencestatistical learningcomputer visionshape statisticsoptimization