PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement Learning

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inefficient coordination in multi-agent reinforcement learning (MARL) caused by neglecting action-level dependencies, this paper proposes a sequential decision-making framework based on differentiable Plackett–Luce ranking. The method models agent decision order as an observation-saliency-driven stochastic ranking process and introduces the first end-to-end differentiable optimization of ranking, effectively mitigating ranking instability and gradient vanishing. Furthermore, it innovatively integrates a priority-aware sequential decision mechanism into a multi-agent Transformer, jointly optimizing credit assignment and temporal modeling. Evaluated on three major benchmarks—StarCraft II, Google Research Football, and Multi-Agent MuJoCo—the approach consistently outperforms state-of-the-art methods, achieving up to a 37% improvement in coordination efficiency. This demonstrates that explicit optimization of decision order is a critical factor for enhancing MARL performance.

Technology Category

Application Category

📝 Abstract
Multi-agent reinforcement learning (MARL) faces challenges in coordinating agents due to complex interdependencies within multi-agent systems. Most MARL algorithms use the simultaneous decision-making paradigm but ignore the action-level dependencies among agents, which reduces coordination efficiency. In contrast, the sequential decision-making paradigm provides finer-grained supervision for agent decision order, presenting the potential for handling dependencies via better decision order management. However, determining the optimal decision order remains a challenge. In this paper, we introduce Action Generation with Plackett-Luce Sampling (AGPS), a novel mechanism for agent decision order optimization. We model the order determination task as a Plackett-Luce sampling process to address issues such as ranking instability and vanishing gradient during the network training process. AGPS realizes credit-based decision order determination by establishing a bridge between the significance of agents' local observations and their decision credits, thus facilitating order optimization and dependency management. Integrating AGPS with the Multi-Agent Transformer, we propose the Prioritized Multi-Agent Transformer (PMAT), a sequential decision-making MARL algorithm with decision order optimization. Experiments on benchmarks including StarCraft II Multi-Agent Challenge, Google Research Football, and Multi-Agent MuJoCo show that PMAT outperforms state-of-the-art algorithms, greatly enhancing coordination efficiency.
Problem

Research questions and friction points this paper is trying to address.

Optimizing agent decision order
Handling action-level dependencies
Enhancing multi-agent coordination efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plackett-Luce Sampling for order
Credit-based decision order optimization
Multi-Agent Transformer integration
🔎 Similar Papers
No similar papers found.