Convergent Q-Learning for Infinite-Horizon General-Sum Markov Games through Behavioral Economics

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the computation of risk-averse quantal response equilibria (RQE) in infinite-horizon general and Markov games, aiming to improve modeling fidelity for human risk aversion and bounded rationality. Methodologically, we first establish the uniqueness and Lipschitz continuity of risk-averse RQE in infinite-horizon Markov games; leveraging this, we construct a contraction risk-averse quantal response Bellman operator and integrate it into the Q-learning framework, yielding a multi-agent reinforcement learning algorithm with rigorous convergence guarantees. Our approach unifies treatment of both game classes, achieving global convergence without additional assumptions—overcoming the fundamental convergence bottleneck of existing RQE algorithms in infinite-horizon settings. The resulting method provides a novel theoretical and algorithmic tool for behavioral game theory and robust multi-agent learning.

Technology Category

Application Category

📝 Abstract
Risk-aversion and bounded rationality are two key characteristics of human decision-making. Risk-averse quantal-response equilibrium (RQE) is a solution concept that incorporates these features, providing a more realistic depiction of human decision making in various strategic environments compared to a Nash equilibrium. Furthermore a class of RQE has recently been shown in arXiv:2406.14156 to be universally computationally tractable in all finite-horizon Markov games, allowing for the development of multi-agent reinforcement learning algorithms with convergence guarantees. In this paper, we expand upon the study of RQE and analyze their computation in both two-player normal form games and discounted infinite-horizon Markov games. For normal form games we adopt a monotonicity-based approach allowing us to generalize previous results. We first show uniqueness and Lipschitz continuity of RQE with respect to player's payoff matrices under monotonicity assumptions, and then provide conditions on the players' degrees of risk aversion and bounded rationality that ensure monotonicity. We then focus on discounted infinite-horizon Markov games. We define the risk-averse quantal-response Bellman operator and prove its contraction under further conditions on the players' risk-aversion, bounded rationality, and temporal discounting. This yields a Q-learning based algorithm with convergence guarantees for all infinite-horizon general-sum Markov games.
Problem

Research questions and friction points this paper is trying to address.

Extend risk-averse quantal-response equilibrium to infinite-horizon games
Ensure computational tractability in general-sum Markov games
Develop convergent Q-learning for human-like decision-making models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Risk-averse quantal-response equilibrium solution concept
Monotonicity-based approach for normal form games
Q-learning algorithm for infinite-horizon Markov games
🔎 Similar Papers
No similar papers found.
Y
Yizhou Zhang
Department of Computing and Mathematical Sciences, California Institute of Technology
Eric Mazumdar
Eric Mazumdar
Assistant Professor, California Institute of Technology