Mirror Descent Under Generalized Smoothness

📅 2025-02-02

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing smoothness definitions and convergence guarantees for nonsmooth machine learning objectives are inadequate in non-Euclidean spaces, where classical Euclidean norms fail to capture intrinsic geometric structure. Method: We introduce ℓ*-smoothness—a novel smoothness notion defined with respect to arbitrary norm pairs (not only the Euclidean norm)—to characterize local curvature of objective functions, and propose a generalized self-bounding property. Contribution/Results: Building on this framework, we establish the first universal convergence theory for mirror descent-type algorithms under both deterministic and stochastic settings: (i) under ℓ*-smoothness, deterministic mirror descent achieves optimal convergence rates matching those of classical smooth optimization; (ii) under a bounded noise condition, stochastic mirror descent attains anytime convergence guarantees. Our work unifies and extends nonsmooth optimization theory by generalizing smoothness to arbitrary convex geometries, thereby providing a rigorous foundation for structured learning problems in non-Euclidean spaces.

Technology Category

Application Category

📝 Abstract

Smoothness is crucial for attaining fast rates in first-order optimization. However, many optimization problems in modern machine learning involve non-smooth objectives. Recent studies relax the smoothness assumption by allowing the Lipschitz constant of the gradient to grow with respect to the gradient norm, which accommodates a broad range of objectives in practice. Despite this progress, existing generalizations of smoothness are restricted to Euclidean geometry with $ell_2$-norm and only have theoretical guarantees for optimization in the Euclidean space. In this paper, we address this limitation by introducing a new $ell*$-smoothness concept that measures the norm of Hessian in terms of a general norm and its dual, and establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness. Notably, we propose a generalized self-bounding property that facilitates bounding the gradients via controlling suboptimality gaps, serving as a principal component for convergence analysis. Beyond deterministic optimization, we establish an anytime convergence for stochastic mirror descent based on a new bounded noise condition that encompasses the widely adopted bounded or affine noise assumptions.

Problem

Research questions and friction points this paper is trying to address.

Non-smooth Optimization

Machine Learning

Convergence Guarantee

Innovation

Methods, ideas, or system contributions that make the work stand out.

ell-smoothness

generalized self-concordance

stochastic mirror descent robustness

🔎 Similar Papers

Directional Smoothness and Gradient Methods: Convergence and Adaptivity