🤖 AI Summary
This paper addresses the computation of risk-averse quantal response equilibria (RQE) in infinite-horizon general and Markov games, aiming to improve modeling fidelity for human risk aversion and bounded rationality. Methodologically, we first establish the uniqueness and Lipschitz continuity of risk-averse RQE in infinite-horizon Markov games; leveraging this, we construct a contraction risk-averse quantal response Bellman operator and integrate it into the Q-learning framework, yielding a multi-agent reinforcement learning algorithm with rigorous convergence guarantees. Our approach unifies treatment of both game classes, achieving global convergence without additional assumptions—overcoming the fundamental convergence bottleneck of existing RQE algorithms in infinite-horizon settings. The resulting method provides a novel theoretical and algorithmic tool for behavioral game theory and robust multi-agent learning.
📝 Abstract
Risk-aversion and bounded rationality are two key characteristics of human decision-making. Risk-averse quantal-response equilibrium (RQE) is a solution concept that incorporates these features, providing a more realistic depiction of human decision making in various strategic environments compared to a Nash equilibrium. Furthermore a class of RQE has recently been shown in arXiv:2406.14156 to be universally computationally tractable in all finite-horizon Markov games, allowing for the development of multi-agent reinforcement learning algorithms with convergence guarantees. In this paper, we expand upon the study of RQE and analyze their computation in both two-player normal form games and discounted infinite-horizon Markov games. For normal form games we adopt a monotonicity-based approach allowing us to generalize previous results. We first show uniqueness and Lipschitz continuity of RQE with respect to player's payoff matrices under monotonicity assumptions, and then provide conditions on the players' degrees of risk aversion and bounded rationality that ensure monotonicity. We then focus on discounted infinite-horizon Markov games. We define the risk-averse quantal-response Bellman operator and prove its contraction under further conditions on the players' risk-aversion, bounded rationality, and temporal discounting. This yields a Q-learning based algorithm with convergence guarantees for all infinite-horizon general-sum Markov games.