Frugality in second-order optimization: floating-point approximations for Newton's method

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

High-order optimizers are rarely used in machine learning training due to their prohibitive computational overhead. Method: This paper systematically analyzes the impact of finite-precision arithmetic on Newton’s method convergence, establishing the first rigorous convergence theorems for mixed-precision variants—including quasi-Newton and inexact Newton methods—and enabling quantitative estimation of solution accuracy bounds. It further proposes the Generalized Gauss–Newton method GNₖ, which drastically reduces computational cost by computing only a subset of second-order derivatives while preserving full Newton-level performance on regression tasks. Results: Experiments demonstrate that GNₖ outperforms Adam on standard benchmarks while incurring significantly lower computational overhead than conventional second-order methods. This work advances practical deployment of high-order optimization by providing both theoretical guarantees and an efficient algorithmic design.

Technology Category

Application Category

📝 Abstract

Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are often avoided due to their computational cost. This work analyzes the impact of finite-precision arithmetic on Newton steps and establishes a convergence theorem for mixed-precision Newton optimizers, including "quasi" and "inexact" variants. The theorem provides not only convergence guarantees but also a priori estimates of the achievable solution accuracy. Empirical evaluations on standard regression benchmarks demonstrate that the proposed methods outperform Adam on the Australian and MUSH datasets. The second part of the manuscript introduces GN_k, a generalized Gauss-Newton method that enables partial computation of second-order derivatives. GN_k attains performance comparable to full Newton's method on regression tasks while requiring significantly fewer derivative evaluations.

Problem

Research questions and friction points this paper is trying to address.

Newton's method computational cost limits machine learning applications

Finite-precision arithmetic affects convergence in second-order optimization

Full second-order derivative computation is prohibitively expensive

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-precision Newton optimizers with convergence guarantees

Generalized Gauss-Newton method reduces derivative computations

Floating-point approximations enable efficient second-order optimization

🔎 Similar Papers

No similar papers found.