On the Complexity of Finding Small Subgradients in Nonsmooth Optimization

📅 2022-09-21

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 2

🤖 AI Summary

This paper studies the oracle complexity of finding a (δ,ε)-stable point—a point whose δ-neighborhood contains a subgradient of norm at most ε—in nonsmooth optimization. Under Lipschitz continuity, it establishes for the first time that deterministic first-order algorithms necessarily incur dimension-dependent complexity, precluding dimension-free bounds; in contrast, randomized algorithms achieve the tight upper bound Õ(1/(δε³)), matched by a universal randomized lower bound. It further reveals that convexity dramatically accelerates convergence: for convex functions, a deterministic O(1/ε²) upper bound is attained, and it is proven that zero subgradients cannot be identified exactly in finite time; for smooth functions, derandomization is achievable with only logarithmic overhead. The core contribution lies in precisely characterizing the fundamental roles of determinism vs. randomness, convexity, and smoothness in governing oracle complexity, and providing tight upper and lower bounds for each setting.

📝 Abstract

We study the oracle complexity of producing $(delta,epsilon)$-stationary points of Lipschitz functions, in the sense proposed by Zhang et al. [2020]. While there exist dimension-free randomized algorithms for producing such points within $widetilde{O}(1/deltaepsilon^3)$ first-order oracle calls, we show that no dimension-free rate can be achieved by a deterministic algorithm. On the other hand, we point out that this rate can be derandomized for smooth functions with merely a logarithmic dependence on the smoothness parameter. Moreover, we establish several lower bounds for this task which hold for any randomized algorithm, with or without convexity. Finally, we show how the convergence rate of finding $(delta,epsilon)$-stationary points can be improved in case the function is convex, a setting which we motivate by proving that in general no finite time algorithm can produce points with small subgradients even for convex functions.

Problem

Research questions and friction points this paper is trying to address.

Study deterministic and randomized algorithms for finding stationary points in nonsmooth optimization

Establish lower bounds for finding stationary points in any randomized algorithm

Improve convergence rates for convex functions in finding stationary points

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dimension-free randomized algorithms for nonsmooth optimization

Derandomized rate for smooth functions with logarithmic dependence

Improved convergence rate for convex functions

🔎 Similar Papers

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis

2023-01-02Annual Conference Computational Learning TheoryCitations: 7

Authors to Follow