🤖 AI Summary
This paper investigates the global convergence of gradient descent for logistic regression under the spherical data assumption—i.e., when all data points lie on a unit sphere. Addressing the open question of whether step sizes within the classical stability threshold guarantee convergence, we combine Hessian spectral analysis with dynamical systems theory to characterize periodic behaviors in the optimization trajectory. We prove that in one dimension, any step size strictly below the stability threshold ensures global convergence; however, in high-dimensional non-separable settings, we construct explicit counterexamples where gradient descent enters persistent periodic cycles—even when the step size satisfies standard linear stability conditions. This is the first result demonstrating that spherical normalization alone cannot preclude non-convergent dynamics under large step sizes in higher dimensions. Our work challenges the conventional intuition that “stability implies convergence” and provides both a critical counterexample and a novel analytical framework for large-step-size optimization theory.
📝 Abstract
Gradient descent (GD) on logistic regression has many fascinating properties. When the dataset is linearly separable, it is known that the iterates converge in direction to the maximum-margin separator regardless of how large the step size is. In the non-separable case, however, it has been shown that GD can exhibit a cycling behaviour even when the step sizes is still below the stability threshold $2/λ$, where $λ$ is the largest eigenvalue of the Hessian at the solution. This short paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence, under any step size below the stability threshold. We prove that this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur. We hope to inspire further studies on quantifying how common these cycles are in realistic datasets, as well as finding sufficient conditions to guarantee global convergence with large step sizes.