Directional Smoothness and Gradient Methods: Convergence and Adaptivity

📅 2024-03-06
🏛️ arXiv.org
📈 Citations: 5
Influential: 1
📄 PDF
🤖 AI Summary
This work addresses the slow convergence of gradient descent on complex objectives and its reliance on strong global smoothness assumptions. We introduce *directional smoothness*, a novel geometric concept characterizing the local smoothness of the objective function along the optimization trajectory—thereby circumventing restrictive global Lipschitz continuity requirements. Leveraging this path-dependent characterization, we derive a trajectory-aware suboptimality bound and formulate an implicit adaptive step-size equation. We theoretically establish that Polyak’s step size and normalized gradient descent inherently achieve path-adaptive fast convergence. Our methodology integrates directional smoothness analysis, implicit step-size design, and convergence theory for both convex and nonconvex settings. Experiments on logistic regression demonstrate that our new bound substantially improves upon classical $L$-smoothness-based guarantees. Notably, this is the first work to provide path-dependent convergence rates for these two canonical algorithms without requiring prior knowledge of smoothness parameters.

Technology Category

Application Category

📝 Abstract
We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on $L$-smoothness.
Problem

Research questions and friction points this paper is trying to address.

Gradient Descent
Optimization Efficiency
Complex Function
Innovation

Methods, ideas, or system contributions that make the work stand out.

Directional Smoothness
Polyak Step Size
Normalized Gradient Descent
🔎 Similar Papers
No similar papers found.