Deep SOR Minimax Q-learning for Two-player Zero-sum Game

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the slow convergence and lack of theoretical guarantees of function-approximation Q-learning in high-dimensional continuous-state-and-action two-player zero-sum games, this paper proposes Deep Successive Over-Relaxation MinMax Q-learning (Deep SOR-MinMax Q-learning). It is the first to integrate continuous successive over-relaxation (SOR) into a deep reinforcement learning framework, accelerating value iteration by reducing the contraction factor of the Bellman operator. We establish its finite-time convergence guarantee under standard assumptions. The algorithm employs deep neural networks to jointly approximate both the value function and policies. Empirical evaluations demonstrate that Deep SOR-MinMax Q-learning significantly outperforms baseline algorithms in convergence speed and stability. Ablation studies reveal systematic performance dependencies on the SOR relaxation parameter, identifying an optimal range that balances acceleration and numerical robustness.

Technology Category

Application Category

📝 Abstract
In this work, we consider the problem of a two-player zero-sum game. In the literature, the successive over-relaxation Q-learning algorithm has been developed and implemented, and it is seen to result in a lower contraction factor for the associated Q-Bellman operator resulting in a faster value iteration-based procedure. However, this has been presented only for the tabular case and not for the setting with function approximation that typically caters to real-world high-dimensional state-action spaces. Furthermore, such settings in the case of two-player zero-sum games have not been considered. We thus propose a deep successive over-relaxation minimax Q-learning algorithm that incorporates deep neural networks as function approximators and is suitable for high-dimensional spaces. We prove the finite-time convergence of the proposed algorithm. Through numerical experiments, we show the effectiveness of the proposed method over the existing Q-learning algorithm. Our ablation studies demonstrate the effect of different values of the crucial successive over-relaxation parameter.
Problem

Research questions and friction points this paper is trying to address.

Extending SOR Q-learning to function approximation settings
Developing deep minimax Q-learning for high-dimensional games
Analyzing convergence and parameter effects in zero-sum games
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep neural networks approximate high-dimensional state-action spaces
Successive over-relaxation accelerates minimax Q-learning convergence
Algorithm provides finite-time convergence guarantee for zero-sum games
🔎 Similar Papers
No similar papers found.
S
Saksham Gautam
Department of Computer Science and Automation, Indian Institute of Science, Bengaluru, India
L
Lakshmi Mandal
Department of Computer Science and Automation, Indian Institute of Science, Bengaluru, India
Shalabh Bhatnagar
Shalabh Bhatnagar
Professor in the Department of Computer Science and Automation, Indian Institute of Science
Stochastic systemscontrolsimulationoptimization