Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

๐Ÿ“… 2024-09-29
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 5
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work challenges the conventional wisdom in optimization that linear convergence of gradient descent requires a quadratic growth condition near the minimizer. We establish, for the first time, that near-linear convergence is achievable under a weaker quartic growth condition. To achieve this, we introduce a geometric decomposition framework grounded in the โ€œravine manifoldโ€ structure and design an adaptive step-size strategy that interleaves short steps with Polyak-type long steps. Our approach integrates matrix sensing, manifold projection, and over-parameterization analysis. We theoretically prove that the proposed method attains near-linear convergence rates for three canonical non-convex problems: matrix sensing, matrix factorization, and over-parameterized single-neuron learning. Moreover, it exhibits strong robustness to initialization and hyperparameter choice. These results significantly broaden the scope of problems for which gradient descent achieves efficient convergence.

Technology Category

Application Category

๐Ÿ“ Abstract
A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution -- which we call the ravine -- so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory and algorithm on the problems of matrix sensing and factorization and learning a single neuron in the overparameterized regime.
Problem

Research questions and friction points this paper is trying to address.

Challenging the belief that linear convergence requires quadratic growth conditions
Proving adaptive gradient descent achieves near-linear convergence under fourth-order growth
Introducing ravine-based stepsize adaptation for efficient optimization in non-quadratic landscapes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive stepsize gradient descent method
Fourth-order growth condition convergence
Ravine manifold with Polyak step interlacing
๐Ÿ”Ž Similar Papers
Damek Davis
Damek Davis
Associate Professor, Statistics and Data Science, Wharton, University of Pennsylvania
OptimizationMachine Learning
D
D. Drusvyatskiy
Department of Mathematics, U. Washington, Seattle, WA 98195
L
Liwei Jiang
Edwardson School of Industrial Engineering, Purdue University, West Lafayette, IN 47906, USA