🤖 AI Summary
For unconstrained convex-concave minimax optimization, this paper proposes a class of inexact regularized Newton-type algorithms that incorporate second-order information into the hypergradient framework while ensuring global convergence under inexact computations. Theoretically, it achieves the first $O(varepsilon^{-2/3})$ iteration complexity—matching the known lower bound—for such problems. Each iteration requires only one Schur decomposition and $O(loglog(1/varepsilon))$ linear solver calls, eliminating the redundant $loglog$ factor present in prior second-order methods. Through analysis based on the restricted gap function, we establish boundedness of iterates and convergence of the averaged sequence to an $varepsilon$-saddle point. Experiments on synthetic and real-world datasets demonstrate that the proposed method significantly outperforms existing second-order minimax optimization algorithms in both accuracy and efficiency.
📝 Abstract
We propose and analyze several inexact regularized Newton-type methods for finding a global saddle point of emph{convex-concave} unconstrained min-max optimization problems. Compared to first-order methods, our understanding of second-order methods for min-max optimization is relatively limited, as obtaining global rates of convergence with second-order information is much more involved. In this paper, we examine how second-order information can be used to speed up extra-gradient methods, even under inexactness. Specifically, we show that the proposed methods generate iterates that remain within a bounded set and that the averaged iterates converge to an $epsilon$-saddle point within $O(epsilon^{-2/3})$ iterations in terms of a restricted gap function. This matched the theoretically established lower bound in this context. We also provide a simple routine for solving the subproblem at each iteration, requiring a single Schur decomposition and $O(loglog(1/epsilon))$ calls to a linear system solver in a quasi-upper-triangular system. Thus, our method improves the existing line-search-based second-order min-max optimization methods by shaving off an $O(loglog(1/epsilon))$ factor in the required number of Schur decompositions. Finally, we present numerical experiments on synthetic and real data that demonstrate the efficiency of the proposed methods.