Negative Stepsizes Make Gradient-Descent-Ascent Converge

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This paper addresses the classical non-convergence of Gradient Descent Ascent (GDA) in convex-concave minimax optimization, attributing it to the inherent limitations of conventional fixed or symmetric step sizes. The authors propose “catapult steps”—a novel time-varying, asymmetric, and periodically negative step-size schedule—and establish, for the first time, that carefully designed negative steps can provably enable standard GDA to achieve last-iterate linear convergence. By decoupling oscillatory dynamics between primal and dual variables and inducing a net second-order displacement, catapult steps generate distinctive “catapult dynamics.” The theoretical analysis leverages irreversibility properties of the gradient flow and second-order difference modeling, unifying the understanding of consensus-based optimization methods. This work provides both a new theoretical foundation and a practical step-size design paradigm for deep minimax training, including Generative Adversarial Networks (GANs).

Technology Category

Application Category

📝 Abstract

Efficient computation of min-max problems is a central question in optimization, learning, games, and controls. Arguably the most natural algorithm is gradient-descent-ascent (GDA). However, since the 1970s, conventional wisdom has argued that GDA fails to converge even on simple problems. This failure spurred an extensive literature on modifying GDA with additional building blocks such as extragradients, optimism, momentum, anchoring, etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules (dubbed slingshot stepsize schedules) that are time-varying, asymmetric, and periodically negative. We show that all three properties are necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained convex-concave problems). All of our results apply to the last iterate of GDA, as is typically desired in practice. The core algorithmic intuition is that although negative stepsizes make backward progress, they de-synchronize the min and max variables (overcoming the cycling issue of GDA), and lead to a slingshot phenomenon in which the forward progress in the other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. We interpret this as a second-order finite-differencing algorithm and show that, intriguingly, it approximately implements consensus optimization, an empirically popular algorithm for min-max problems involving deep neural networks (e.g., training GANs).

Problem

Research questions and friction points this paper is trying to address.

Negative stepsizes enable GDA convergence in min-max problems

Slingshot stepsize schedules overcome cycling issues in GDA

Second-order dynamics replace extragradient methods for optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using time-varying asymmetric stepsize schedules

Introducing periodically negative stepsizes

Leveraging second-order finite-differencing algorithm

🔎 Similar Papers

No similar papers found.