Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This paper investigates the global convergence of gradient descent for logistic regression under the spherical data assumption—i.e., when all data points lie on a unit sphere. Addressing the open question of whether step sizes within the classical stability threshold guarantee convergence, we combine Hessian spectral analysis with dynamical systems theory to characterize periodic behaviors in the optimization trajectory. We prove that in one dimension, any step size strictly below the stability threshold ensures global convergence; however, in high-dimensional non-separable settings, we construct explicit counterexamples where gradient descent enters persistent periodic cycles—even when the step size satisfies standard linear stability conditions. This is the first result demonstrating that spherical normalization alone cannot preclude non-convergent dynamics under large step sizes in higher dimensions. Our work challenges the conventional intuition that “stability implies convergence” and provides both a critical counterexample and a novel analytical framework for large-step-size optimization theory.

Technology Category

Application Category

📝 Abstract

Gradient descent (GD) on logistic regression has many fascinating properties. When the dataset is linearly separable, it is known that the iterates converge in direction to the maximum-margin separator regardless of how large the step size is. In the non-separable case, however, it has been shown that GD can exhibit a cycling behaviour even when the step sizes is still below the stability threshold $2/λ$, where $λ$ is the largest eigenvalue of the Hessian at the solution. This short paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence, under any step size below the stability threshold. We prove that this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur. We hope to inspire further studies on quantifying how common these cycles are in realistic datasets, as well as finding sufficient conditions to guarantee global convergence with large step sizes.

Problem

Research questions and friction points this paper is trying to address.

Exploring large step-size effects on logistic regression convergence

Investigating cycling behavior in non-separable datasets with equal-magnitude data

Analyzing global convergence conditions for gradient descent on spherical data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large step-sizes in logistic regression GD

Data magnitude restriction for convergence

Cycling behavior analysis in higher dimensions

🔎 Similar Papers

No similar papers found.