🤖 AI Summary
This paper addresses nonconvex optimization problems characterized by bound constraints, stochastic gradient noise, and multilayer architectures. Method: We propose two adaptive second-order optimization algorithms that extend AdaGrad to both multilayer settings and the additive Schwarz domain decomposition framework, unifying the treatment of noisy gradients and constraints. Grounded in the objective-function-free optimization (OFFO) paradigm, our approach recursively integrates approximate gradient and curvature information. Contribution/Results: We establish the first joint convergence theory for such structured nonconvex problems, guaranteeing—under high probability—convergence to an ε-accurate critical point with an iteration complexity of O(ε⁻²). Extensive experiments demonstrate significant improvements in computational efficiency and robustness on large-scale tasks, including PDE solving and deep neural network training.
📝 Abstract
Two OFFO (Objective-Function Free Optimization) noise tolerant algorithms are presented that handle bound constraints, inexact gradients and use second-order information when available.The first is a multi-level method exploiting a hierarchical description of the problem and the second is a domain-decomposition method covering the standard addditive Schwarz decompositions. Both are generalizations of the first-order AdaGrad algorithm for unconstrained optimization. Because these algorithms share a common theoretical framework, a single convergence/complexity theory is provided which covers them both. Its main result is that, with high probability, both methods need at most $O(ε^{-2})$ iterations and noisy gradient evaluations to compute an $ε$-approximate first-order critical point of the bound-constrained problem. Extensive numerical experiments are discussed on applications ranging from PDE-based problems to deep neural network training, illustrating their remarkable computational efficiency.