CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

In cooperative multi-agent reinforcement learning (MARL), credit assignment often suffers from bias and suboptimal policy updates due to neglect of individual agent contributions. To address this, we propose Coalition-Rational Advantage Decomposition (CRAD), the first method to incorporate the core solution and coalition marginal contributions from cooperative game theory into credit assignment—thereby enforcing rationality constraints at the coalition level—while employing stochastic coalition sampling to mitigate computational complexity. CRAD unifies Shapley value-based attribution, multi-agent policy gradients, and coalition-level advantage modeling. Empirical evaluation across matrix games, differential games, and collaborative benchmark tasks demonstrates significant improvements over state-of-the-art baselines. Notably, CRAD enhances convergence stability and final performance in multi-local-optima settings, validating both the theoretical soundness and practical efficacy of coalition-aware credit assignment.

Technology Category

Application Category

📝 Abstract

This work focuses on the credit assignment problem in cooperative multi-agent reinforcement learning (MARL). Sharing the global advantage among agents often leads to suboptimal policy updates as it fails to account for the distinct contributions of agents. Although numerous methods consider global or individual contributions for credit assignment, a detailed analysis at the coalition level remains lacking in many approaches. This work analyzes the over-updating problem during multi-agent policy updates from a coalition-level perspective. To address this issue, we propose a credit assignment method called Coalitional Rational Advantage Decomposition (CORA). CORA evaluates coalitional advantages via marginal contributions from all possible coalitions and decomposes advantages using the core solution from cooperative game theory, ensuring coalitional rationality. To reduce computational overhead, CORA employs random coalition sampling. Experiments on matrix games, differential games, and multi-agent collaboration benchmarks demonstrate that CORA outperforms strong baselines, particularly in tasks with multiple local optima. These findings highlight the importance of coalition-aware credit assignment for improving MARL performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses credit assignment in cooperative multi-agent reinforcement learning

Analyzes coalition-level contributions to prevent over-updating policies

Proposes CORA for coalitional advantage decomposition with rationality

Innovation

Methods, ideas, or system contributions that make the work stand out.

CORA uses coalitional advantage decomposition

Employs core solution from game theory

Reduces overhead via random coalition sampling

🔎 Similar Papers

No similar papers found.