Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization

📅 2023-03-07

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 2

career value

200K/year

🤖 AI Summary

This work studies nonconvex–PL-type nonsmooth minimax optimization, where the objective is nonconvex in the primal variable $x$, nonconcave but satisfies the Polyak–Łojasiewicz (PL) condition in the adversarial variable $y$, and includes a nonsmooth regularizer. To address this setting, we propose Momentum-enhanced Self-adaptive Gradient Descent-Ascent (MSGDA), an algorithm featuring independent, learning-rate-free, AdaGrad-style step-size updates for $x$ and $y$. Theoretically, MSGDA achieves the optimal sample complexity $ ilde{O}(varepsilon^{-3})$—i.e., one sample per iteration—to converge to an $varepsilon$-stationary point. Empirically, we validate MSGDA on PL-games and Wasserstein-GAN tasks, demonstrating both its effectiveness and significant improvements over existing baselines.

📝 Abstract

Minimax optimization recently is widely applied in many machine learning tasks such as generative adversarial networks, robust learning and reinforcement learning. In the paper, we study a class of nonconvex-nonconcave minimax optimization with nonsmooth regularization, where the objective function is possibly nonconvex on primal variable $x$, and it is nonconcave and satisfies the Polyak-Lojasiewicz (PL) condition on dual variable $y$. Moreover, we propose a class of enhanced momentum-based gradient descent ascent methods (i.e., MSGDA and AdaMSGDA) to solve these stochastic nonconvex-PL minimax problems. In particular, our AdaMSGDA algorithm can use various adaptive learning rates in updating the variables $x$ and $y$ without relying on any specifical types. Theoretically, we prove that our methods have the best known sample complexity of $ ilde{O}(epsilon^{-3})$ only requiring one sample at each loop in finding an $epsilon$-stationary solution. Some numerical experiments on PL-game and Wasserstein-GAN demonstrate the efficiency of our proposed methods.

Problem

Research questions and friction points this paper is trying to address.

Develops algorithms for nonconvex-PL minimax optimization

Enhances adaptive gradient methods for nonsmooth regularization

Improves sample complexity for ε-stationary solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced momentum-based gradient descent ascent methods

Adaptive learning rates for variables x and y

Best known sample complexity O(ε⁻³)

🔎 Similar Papers

Two Completely Parameter-Free Alternating Gradient Projection Algorithms for Nonconvex-(strongly) Concave Minimax Problems