Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work proposes Risk-Sensitive Quantal Response Equilibrium (RQRE) to address the challenges of computing Nash equilibria in general and Markov games—namely, computational intractability, multiplicity, and sensitivity to approximation errors. By integrating optimistic value iteration with linear function approximation, RQRE efficiently learns a unique, smooth, and robust equilibrium policy in large-scale or continuous state spaces. Theoretically, RQRE characterizes a performance–robustness Pareto frontier between rationality and risk sensitivity, admits a distributionally robust optimization interpretation, and features a Lipschitz-continuous policy mapping. Algorithmically, the proposed RQRE-OVI enjoys finite-sample regret bounds and explicit sample complexity guarantees. Empirical results demonstrate that RQRE significantly outperforms Nash equilibrium approaches in both self-play and cross-play settings, exhibiting superior robustness, generalization, and convergence properties.

Technology Category

Application Category

📝 Abstract

Provably efficient and robust equilibrium computation in general-sum Markov games remains a core challenge in multi-agent reinforcement learning. Nash equilibrium is computationally intractable in general and brittle due to equilibrium multiplicity and sensitivity to approximation error. We study Risk-Sensitive Quantal Response Equilibrium (RQRE), which yields a unique, smooth solution under bounded rationality and risk sensitivity. We propose \texttt{RQRE-OVI}, an optimistic value iteration algorithm for computing RQRE with linear function approximation in large or continuous state spaces. Through finite-sample regret analysis, we establish convergence and explicitly characterize how sample complexity scales with rationality and risk-sensitivity parameters. The regret bounds reveal a quantitative tradeoff: increasing rationality tightens regret, while risk sensitivity induces regularization that enhances stability and robustness. This exposes a Pareto frontier between expected performance and robustness, with Nash recovered in the limit of perfect rationality and risk neutrality. We further show that the RQRE policy map is Lipschitz continuous in estimated payoffs, unlike Nash, and RQRE admits a distributionally robust optimization interpretation. Empirically, we demonstrate that \texttt{RQRE-OVI} achieves competitive performance under self-play while producing substantially more robust behavior under cross-play compared to Nash-based approaches. These results suggest \texttt{RQRE-OVI} offers a principled, scalable, and tunable path for equilibrium learning with improved robustness and generalization.

Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning

Markov Games

Nash Equilibrium

Robustness

Equilibrium Computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Risk-Sensitive Quantal Response Equilibrium

Optimistic Value Iteration

Linear Function Approximation