Directional Smoothness and Gradient Methods: Convergence and Adaptivity

📅 2024-03-06

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 1

career value

211K/year

🤖 AI Summary

This work addresses the slow convergence of gradient descent on complex objectives and its reliance on strong global smoothness assumptions. We introduce *directional smoothness*, a novel geometric concept characterizing the local smoothness of the objective function along the optimization trajectory—thereby circumventing restrictive global Lipschitz continuity requirements. Leveraging this path-dependent characterization, we derive a trajectory-aware suboptimality bound and formulate an implicit adaptive step-size equation. We theoretically establish that Polyak’s step size and normalized gradient descent inherently achieve path-adaptive fast convergence. Our methodology integrates directional smoothness analysis, implicit step-size design, and convergence theory for both convex and nonconvex settings. Experiments on logistic regression demonstrate that our new bound substantially improves upon classical $L$-smoothness-based guarantees. Notably, this is the first work to provide path-dependent convergence rates for these two canonical algorithms without requiring prior knowledge of smoothness parameters.

Technology Category

Application Category

📝 Abstract

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on $L$-smoothness.

Problem

Research questions and friction points this paper is trying to address.

Gradient Descent

Optimization Efficiency

Complex Function

Innovation

Methods, ideas, or system contributions that make the work stand out.

Directional Smoothness

Polyak Step Size

Normalized Gradient Descent

🔎 Similar Papers

No similar papers found.