Deep SOR Minimax Q-learning for Two-player Zero-sum Game

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the slow convergence and lack of theoretical guarantees of function-approximation Q-learning in high-dimensional continuous-state-and-action two-player zero-sum games, this paper proposes Deep Successive Over-Relaxation MinMax Q-learning (Deep SOR-MinMax Q-learning). It is the first to integrate continuous successive over-relaxation (SOR) into a deep reinforcement learning framework, accelerating value iteration by reducing the contraction factor of the Bellman operator. We establish its finite-time convergence guarantee under standard assumptions. The algorithm employs deep neural networks to jointly approximate both the value function and policies. Empirical evaluations demonstrate that Deep SOR-MinMax Q-learning significantly outperforms baseline algorithms in convergence speed and stability. Ablation studies reveal systematic performance dependencies on the SOR relaxation parameter, identifying an optimal range that balances acceleration and numerical robustness.

Technology Category

Application Category

📝 Abstract

In this work, we consider the problem of a two-player zero-sum game. In the literature, the successive over-relaxation Q-learning algorithm has been developed and implemented, and it is seen to result in a lower contraction factor for the associated Q-Bellman operator resulting in a faster value iteration-based procedure. However, this has been presented only for the tabular case and not for the setting with function approximation that typically caters to real-world high-dimensional state-action spaces. Furthermore, such settings in the case of two-player zero-sum games have not been considered. We thus propose a deep successive over-relaxation minimax Q-learning algorithm that incorporates deep neural networks as function approximators and is suitable for high-dimensional spaces. We prove the finite-time convergence of the proposed algorithm. Through numerical experiments, we show the effectiveness of the proposed method over the existing Q-learning algorithm. Our ablation studies demonstrate the effect of different values of the crucial successive over-relaxation parameter.

Problem

Research questions and friction points this paper is trying to address.

Extending SOR Q-learning to function approximation settings

Developing deep minimax Q-learning for high-dimensional games

Analyzing convergence and parameter effects in zero-sum games

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep neural networks approximate high-dimensional state-action spaces

Successive over-relaxation accelerates minimax Q-learning convergence

Algorithm provides finite-time convergence guarantee for zero-sum games

🔎 Similar Papers

Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents