Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity

📅 2023-09-08

🏛️ IEEE Conference on Decision and Control

📈 Citations: 7

✨ Influential: 1

career value

206K/year

🤖 AI Summary

This paper addresses the model-free learning of Nash equilibria in zero-sum linear-quadratic (LQ) games with unknown system dynamics. We propose a nested zeroth-order natural policy gradient algorithm that unifies single-point and two-point gradient estimators, achieving— for the first time—sample complexities of $mathcal{O}(varepsilon^{-2})$ (two-point) and $mathcal{O}(varepsilon^{-3})$ (single-point), substantially improving upon existing polynomial upper bounds. The algorithm incorporates implicit regularization to ensure controller robustness and is theoretically guaranteed to converge to an $varepsilon$-neighborhood of the Nash equilibrium. Key contributions are: (1) a lightweight nested iterative architecture that avoids costly high-dimensional Hessian estimation; (2) a tight analytical framework coupling zeroth-order optimization with LQ game policy gradients; and (3) state-of-the-art sample efficiency attained without system identification or explicit model knowledge.

📝 Abstract

Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control and can be used (i) as a dynamic game formulation for risk-sensitive or robust control, or (ii) as a benchmark setting for multi-agent reinforcement learning with two competing agents in continuous state-control spaces. In contrast to the well-studied single-agent linear quadratic regulator problem, zero-sum LQ games entail solving a challenging nonconvex-nonconcave min-max problem with an objective function that lacks coercivity. Recently, Zhang et al. [1] discovered an implicit regularization property of natural policy gradient methods which is crucial for safety-critical control systems since it preserves the robustness of the controller during learning. Moreover, in the model-free setting where the knowledge of model parameters is not available, Zhang et al. proposed the first polynomial sample complexity algorithm to reach an $epsilon$ neighborhood of the Nash equilibrium while maintaining the desirable implicit regularization property. In this work, we propose a simpler nested Zeroth-Order (ZO) algorithm improving sample complexity by several orders of magnitude. Our main result guarantees a $ ilde{mathcal{O}}(epsilon^{-3})$ sample complexity under the same assumptions using a single-point ZO estimator. Furthermore, when the estimator is replaced by a two-point estimator, our method enjoys a better $ ilde{mathcal{O}}(epsilon^{-2})$ sample complexity. Our key improvements rely on a more sample-efficient nested algorithm design and finer control of the ZO natural gradient estimation error.

Problem

Research questions and friction points this paper is trying to address.

Learning Nash equilibrium in zero-sum linear quadratic games

Improving sample complexity for model-free policy optimization

Achieving last-iterate convergence in nonconvex-nonconcave min-max problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simpler nested Zeroth-Order algorithm design

Global last-iterate linear convergence guarantee

Improved sample complexity with single-point estimator

🔎 Similar Papers

Learning in Zero-Sum Markov Games: Relaxing Strong Reachability and Mixing Time Assumptions