TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization

📅 2022-10-31
🏛️ International Conference on Learning Representations
📈 Citations: 10
Influential: 5
📄 PDF
🤖 AI Summary
Gradient descent-ascent (GDA) for nonconvex minimax optimization suffers from sensitivity to hand-tuned hyperparameters, reliance on problem-specific prior knowledge, and stringent requirements on strict time-scale separation between primal and dual updates. Method: This paper proposes TiAda, a single-loop adaptive algorithm that introduces the first fully parameter-free time-scale adaptation mechanism: it dynamically decouples the update frequencies of primal and dual variables via adaptive step sizes. Contribution/Results: We establish theoretical convergence guarantees showing that TiAda achieves near-optimal complexity in both deterministic and stochastic settings under nonconvex–strongly-concave assumptions. Empirically, TiAda significantly improves convergence speed and robustness on benchmark tasks—including GAN training and AUC maximization—without any hyperparameter tuning.
📝 Abstract
Adaptive gradient methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner, and empirically achieve faster convergence for solving minimization problems. When it comes to nonconvex minimax optimization, however, current convergence analyses of gradient descent ascent (GDA) combined with adaptive stepsizes require careful tuning of hyper-parameters and the knowledge of problem-dependent parameters. Such a discrepancy arises from the primal-dual nature of minimax problems and the necessity of delicate time-scale separation between the primal and dual updates in attaining convergence. In this work, we propose a single-loop adaptive GDA algorithm called TiAda for nonconvex minimax optimization that automatically adapts to the time-scale separation. Our algorithm is fully parameter-agnostic and can achieve near-optimal complexities simultaneously in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems. The effectiveness of the proposed method is further justified numerically for a number of machine learning applications.
Problem

Research questions and friction points this paper is trying to address.

Addresses nonconvex minimax optimization with adaptive time-scale separation
Eliminates need for hyper-parameter tuning in gradient descent ascent
Provides parameter-agnostic convergence for nonconvex-strongly-concave problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-scale adaptive GDA algorithm for minimax optimization
Parameter-agnostic single-loop adaptive gradient method
Automatically adapts to primal-dual time-scale separation
🔎 Similar Papers
No similar papers found.