Gradient descent with generalized Newton's method

📅 2024-07-03

📈 Citations: 2

✨ Influential: 1

career value

183K/year

🤖 AI Summary

To address the limitations of manual learning rate tuning, slow convergence, and lack of adaptivity in first-order optimizers, this paper proposes Generalized Newton (GeN)—the first Hessian-guided universal optimization framework compatible with any first-order optimizer (e.g., SGD, Adam) as a plug-and-play module. GeN employs forward-mode gradient estimation to compute lightweight, differentiable Hessian approximations, enabling automatic, dynamic learning rate derivation without additional hyperparameters or schedulers. Theoretically, it generalizes the Newton–Raphson method as a special case, achieving second-order optimization accuracy while retaining the computational efficiency of first-order methods. Empirically, GeN achieves state-of-the-art performance on language modeling (GPT) and vision (ResNet) benchmarks, significantly accelerating convergence. Its marginal overhead—both in training time and GPU memory—is negligible after amortization.

Technology Category

Application Category

📝 Abstract

We propose the generalized Newton's method (GeN) -- a Hessian-informed approach that applies to any optimizer such as SGD and Adam, and covers the Newton-Raphson method as a sub-case. Our method automatically and dynamically selects the learning rate that accelerates the convergence, without the intensive tuning of the learning rate scheduler. In practice, our method is easily implementable, since it only requires additional forward passes with almost zero computational overhead (in terms of training time and memory cost), if the overhead is amortized over many iterations. We present extensive experiments on language and vision tasks (e.g. GPT and ResNet) to showcase that GeN optimizers match the state-of-the-art performance, which was achieved with carefully tuned learning rate schedulers.

Problem

Research questions and friction points this paper is trying to address.

Machine Learning

Optimization Algorithm

Learning Rate Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Newton Method

Automatic Learning Rate Determination

Efficiency Improvement

🔎 Similar Papers

No similar papers found.