🤖 AI Summary
High-order optimizers are rarely used in machine learning training due to their prohibitive computational overhead. Method: This paper systematically analyzes the impact of finite-precision arithmetic on Newton’s method convergence, establishing the first rigorous convergence theorems for mixed-precision variants—including quasi-Newton and inexact Newton methods—and enabling quantitative estimation of solution accuracy bounds. It further proposes the Generalized Gauss–Newton method GNₖ, which drastically reduces computational cost by computing only a subset of second-order derivatives while preserving full Newton-level performance on regression tasks. Results: Experiments demonstrate that GNₖ outperforms Adam on standard benchmarks while incurring significantly lower computational overhead than conventional second-order methods. This work advances practical deployment of high-order optimization by providing both theoretical guarantees and an efficient algorithmic design.
📝 Abstract
Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are often avoided due to their computational cost. This work analyzes the impact of finite-precision arithmetic on Newton steps and establishes a convergence theorem for mixed-precision Newton optimizers, including "quasi" and "inexact" variants. The theorem provides not only convergence guarantees but also a priori estimates of the achievable solution accuracy. Empirical evaluations on standard regression benchmarks demonstrate that the proposed methods outperform Adam on the Australian and MUSH datasets. The second part of the manuscript introduces GN_k, a generalized Gauss-Newton method that enables partial computation of second-order derivatives. GN_k attains performance comparable to full Newton's method on regression tasks while requiring significantly fewer derivative evaluations.