Solving Zero-Sum Convex Markov Games

📅 2025-06-19

📈 Citations: 1

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the global convergence of independent policy gradient methods in two-player zero-sum convex Markov games (cMGs). Fundamental challenges—including infinite horizons, nonconvex policy parameterizations, and absence of Bellman consistency—hinder existing analyses. To overcome these, we introduce a novel *implicit convex–implicit concave* regularization framework that reformulates the original min-max problem into an objective satisfying the nonconvex proximal Polyak–Łojasiewicz (PL) condition. Building upon this, we propose a stochastic nested alternating gradient ascent–descent algorithm. We establish the first global convergence guarantee to Nash equilibria in cMGs, with an explicit sublinear convergence rate. Our analysis is theoretically robust and generalizes beyond cMGs: the methodology directly extends to generic constrained min-max optimization problems, offering broad applicability in nonconvex game-theoretic and adversarial learning settings.

Technology Category

Application Category

📝 Abstract

We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity, the absence of Bellman consistency, and the complexity of the infinite horizon. We follow a two-step approach. First, leveraging properties of hidden-convex--hidden-concave functions, we show that a simple nonconvex regularization transforms the min-max optimization problem into a nonconvex-proximal Polyak-Lojasiewicz (NC-pPL) objective. Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria. Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.

Problem

Research questions and friction points this paper is trying to address.

Global convergence to Nash equilibria in zero-sum convex Markov games

Overcoming nonconvexity and absence of Bellman consistency in cMGs

Stabilizing policy gradient methods for min-max optimization in cMGs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Independent policy gradient methods for convergence

Nonconvex regularization transforms min-max optimization

Global convergence guarantees for gradient methods

🔎 Similar Papers

Learning in Zero-Sum Markov Games: Relaxing Strong Reachability and Mixing Time Assumptions