Frugality in second-order optimization: floating-point approximations for Newton's method

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-order optimizers are rarely used in machine learning training due to their prohibitive computational overhead. Method: This paper systematically analyzes the impact of finite-precision arithmetic on Newton’s method convergence, establishing the first rigorous convergence theorems for mixed-precision variants—including quasi-Newton and inexact Newton methods—and enabling quantitative estimation of solution accuracy bounds. It further proposes the Generalized Gauss–Newton method GNₖ, which drastically reduces computational cost by computing only a subset of second-order derivatives while preserving full Newton-level performance on regression tasks. Results: Experiments demonstrate that GNₖ outperforms Adam on standard benchmarks while incurring significantly lower computational overhead than conventional second-order methods. This work advances practical deployment of high-order optimization by providing both theoretical guarantees and an efficient algorithmic design.

Technology Category

Application Category

📝 Abstract
Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are often avoided due to their computational cost. This work analyzes the impact of finite-precision arithmetic on Newton steps and establishes a convergence theorem for mixed-precision Newton optimizers, including "quasi" and "inexact" variants. The theorem provides not only convergence guarantees but also a priori estimates of the achievable solution accuracy. Empirical evaluations on standard regression benchmarks demonstrate that the proposed methods outperform Adam on the Australian and MUSH datasets. The second part of the manuscript introduces GN_k, a generalized Gauss-Newton method that enables partial computation of second-order derivatives. GN_k attains performance comparable to full Newton's method on regression tasks while requiring significantly fewer derivative evaluations.
Problem

Research questions and friction points this paper is trying to address.

Newton's method computational cost limits machine learning applications
Finite-precision arithmetic affects convergence in second-order optimization
Full second-order derivative computation is prohibitively expensive
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-precision Newton optimizers with convergence guarantees
Generalized Gauss-Newton method reduces derivative computations
Floating-point approximations enable efficient second-order optimization
🔎 Similar Papers
No similar papers found.
G
Giuseppe Carrino
Department of Computer Science and Engineering, University of Bologna
E
Elena Loli Piccolomini
Department of Computer Science and Engineering, University of Bologna
Elisa Riccietti
Elisa Riccietti
ENS Lyon
Numerical OptimizationMachine Learning
Theo Mary
Theo Mary
Sorbonne Université, CNRS, LIP6
Numerical Linear AlgebraHigh Performance ComputingNumerical Analysis