Exact Gauss-Newton Optimization for Training Deep Neural Networks

📅 2024-05-23
🏛️ Neurocomputing
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
In deep neural network training, first-order optimizers suffer from slow convergence when parameter dimensions vastly exceed batch sizes, while second-order methods incur prohibitive computational costs. To address this, we propose the Efficient Gauss-Newton (EGN) optimizer. EGN employs a Generalized Gauss-Newton (GGN) Hessian approximation and—novelly—applies the Duncan–Guttman identity to enable exact low-rank decomposition of mini-batch GGN matrices, enabling efficient second-order direction computation. It seamlessly integrates backtracking line search, adaptive regularization, and momentum. Theoretically, EGN guarantees linear convergence to an ε-stationary point. Empirically, across supervised learning and reinforcement learning benchmarks, EGN consistently matches or surpasses the generalization performance of finely tuned SGD, Adam, and SGN, with only moderate additional computational overhead.

Technology Category

Application Category

📝 Abstract
We present EGN, a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges to an $epsilon$-stationary point at a linear rate. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, and SGN optimizers across various supervised and reinforcement learning tasks.
Problem

Research questions and friction points this paper is trying to address.

Optimizes deep neural networks using exact Gauss-Newton method
Computes efficient descent directions via low-rank matrix factorization
Improves convergence and generalization over common optimization algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic second-order optimization using Gauss-Newton approximation
Leverages matrix identity for mini-batch sized factorization
Integrates line search and momentum for accelerated convergence
🔎 Similar Papers
No similar papers found.
M
Mikalai Korbit
DYSCO (Dynamical Systems, Control, and Optimization), IMT School for Advanced Studies Lucca, Italy
A
Adeyemi D. Adeoye
DYSCO (Dynamical Systems, Control, and Optimization), IMT School for Advanced Studies Lucca, Italy
Alberto Bemporad
Alberto Bemporad
Professor of Control Systems, IMT Lucca, Italy
control systemsmodel predictive controlautomotive controlquadratic programmingnonlinear system identification
Mario Zanon
Mario Zanon
IMT Lucca
Model Predictive ControlOptimal ControlOptimization