🤖 AI Summary
This work addresses bilevel optimization problems where both upper- and lower-level subproblems exhibit minimax structures, overcoming the limitation of existing methods that rely on strong convexity assumptions for the lower-level problem. The paper introduces, for the first time, a penalty function approach that provides a unified treatment for lower-level problems formulated either as minimax or constrained minimization tasks. By integrating Lagrangian duality with first-order gradient oracles—both deterministic and stochastic—the proposed method computes an ε-KKT point without requiring strong convexity. In the deterministic setting, it achieves a query complexity of Õ(ε⁻⁴), significantly improving upon the prior best-known bound of Õ(ε⁻⁷). In the stochastic setting, it attains a complexity of Õ(ε⁻⁹), establishing the first non-strongly-convex complexity guarantee for this class of bilevel minimax problems.
📝 Abstract
We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ε$-KKT point with $\tilde{O}(ε^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ε^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ε^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ε$-KKT point with $\tilde{O}(ε^{-9})$ oracle complexity.