Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In cooperative multi-agent reinforcement learning (MARL), existing risk-averse methods often converge to suboptimal equilibria, while optimistic approaches lack theoretical foundations. Method: This paper is the first to formally characterize optimism as risk-seeking behavior from a theoretical perspective, proposing a decentralized MARL framework grounded in the dual representation of convex risk measures. Its core innovations include defining an optimistic value function, deriving its policy gradient theorem, and designing a decentralized actor-critic algorithm that integrates KL divergence regularization with risk-sensitive optimization. Results: Experiments on multiple cooperative benchmark tasks demonstrate that the proposed method significantly outperforms risk-neutral baselines and heuristic optimistic approaches, achieving improved coordination efficiency and enhanced policy stability.

Technology Category

Application Category

📝 Abstract
Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for the entropic risk/KL-penalty setting, and develop decentralized optimistic actor-critic algorithms that implement these updates. Empirical results on cooperative benchmarks demonstrate that risk-seeking optimism consistently improves coordination over both risk-neutral baselines and heuristic optimistic methods. Our framework thus unifies risk-sensitive learning and optimism, offering a theoretically grounded and practically effective approach to cooperation in MARL.
Problem

Research questions and friction points this paper is trying to address.

Modeling optimism as risk-seeking behavior in multi-agent reinforcement learning
Addressing suboptimal equilibria from conservatism in cooperative MARL settings
Providing theoretical grounding for heuristic optimistic methods in MARL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Risk-seeking objectives formalized as optimism
Policy-gradient theorem for optimistic value functions
Decentralized optimistic actor-critic algorithms developed
🔎 Similar Papers
No similar papers found.
R
Runyu Zhang
Laboratory for Information & Decision Systems, Massachusetts Institute of Technology
N
Na Li
Harvard University
Asuman Ozdaglar
Asuman Ozdaglar
Mathworks Professor, EECS, MIT
Optimization and Game TheoryMachine LearningEconomic and Social Networks
J
Jeff Shamma
University of Illinois at Urbana-Champaign
Gioele Zardini
Gioele Zardini
Rudge (1948) and Nancy Allen Assistant Professor at MIT
Robotic NetworksCo-DesignMulti-Agent AutonomyCompositionalityITS