Balancing Gradient and Hessian Queries in Non-Convex Optimization

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the gradient–Hessian query trade-off in finding approximate critical points for nonconvex optimization. We propose an adaptive higher-order optimization framework that, for the first time, enables flexible coordination between gradient and Hessian evaluations: it supports approximate Hessian computation and achieves optimal gradient complexity with only a single Hessian evaluation. Our method relies on Lipschitz continuity assumptions on both the gradient and Hessian, integrating higher-order smoothness analysis with a dynamic query mechanism. Under an $L_2$-Lipschitz Hessian condition, the gradient complexity is $ ilde{O}(L_2^{1/4} n_H^{-1/2}Deltaepsilon^{-9/4})$, where $n_H$ denotes the number of Hessian queries. When $n_H = 1$, this improves to $ ilde{O}(L_2^{3/4}Delta^{3/2}epsilon^{-9/4})$, substantially outperforming existing algorithms—particularly in low-dimensional settings and scenarios with sparse Hessian access.

Technology Category

Application Category

📝 Abstract
We develop optimization methods which offer new trade-offs between the number of gradient and Hessian computations needed to compute the critical point of a non-convex function. We provide a method that for any twice-differentiable $fcolon mathbb R^d ightarrow mathbb R$ with $L_2$-Lipschitz Hessian, input initial point with $Delta$-bounded sub-optimality, and sufficiently small $epsilon>0$, outputs an $epsilon$-critical point, i.e., a point $x$ such that $| abla f(x)| leq epsilon$, using $ ilde{O}(L_2^{1/4} n_H^{-1/2}Deltaepsilon^{-9/4})$ queries to a gradient oracle and $n_H$ queries to a Hessian oracle for any positive integer $n_H$. As a consequence, we obtain an improved gradient query complexity of $ ilde{O}(d^{1/3}L_2^{1/2}Deltaepsilon^{-3/2})$ in the case of bounded dimension and of $ ilde{O}(L_2^{3/4}Delta^{3/2}epsilon^{-9/4})$ in the case where we are allowed only a emph{single} Hessian query. We obtain these results through a more general algorithm which can handle approximate Hessian computations and recovers the state-of-the-art bound of computing an $epsilon$-critical point with $O(L_1^{1/2}L_2^{1/4}Deltaepsilon^{-7/4})$ gradient queries provided that $f$ also has an $L_1$-Lipschitz gradient.
Problem

Research questions and friction points this paper is trying to address.

Optimizing trade-offs between gradient and Hessian queries in non-convex optimization
Developing methods to find critical points with fewer computational queries
Improving query complexity bounds for non-convex optimization problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes gradient and Hessian query trade-offs
Achieves critical points with reduced gradient queries
Handles approximate Hessian computations efficiently