🤖 AI Summary
This work addresses the slow convergence of gradient descent on complex objectives and its reliance on strong global smoothness assumptions. We introduce *directional smoothness*, a novel geometric concept characterizing the local smoothness of the objective function along the optimization trajectory—thereby circumventing restrictive global Lipschitz continuity requirements. Leveraging this path-dependent characterization, we derive a trajectory-aware suboptimality bound and formulate an implicit adaptive step-size equation. We theoretically establish that Polyak’s step size and normalized gradient descent inherently achieve path-adaptive fast convergence. Our methodology integrates directional smoothness analysis, implicit step-size design, and convergence theory for both convex and nonconvex settings. Experiments on logistic regression demonstrate that our new bound substantially improves upon classical $L$-smoothness-based guarantees. Notably, this is the first work to provide path-dependent convergence rates for these two canonical algorithms without requiring prior knowledge of smoothness parameters.
📝 Abstract
We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on $L$-smoothness.