AOAD-MAT: Transformer-based multi-agent deep reinforcement learning model considering agents' order of action decisions

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
Existing multi-agent reinforcement learning (MARL) methods—such as MAT and ACE—employ sequential modeling but fail to explicitly account for the agent action decision order, a critical factor in cooperative dynamics. To address this, we propose AOAD-MAT, the first MARL framework that explicitly learns dynamic action decision ordering end-to-end. It introduces a novel auxiliary task—predicting the next agent to act—jointly optimized with the PPO objective, thereby modeling inter-agent action dependencies intrinsically. Built upon a Transformer-based actor-critic architecture, AOAD-MAT achieves significant performance gains over MAT and other baselines on SMAC and MAMuJoCo benchmarks. Our core contribution is the formalization of action timing as a learnable variable within MARL, demonstrating that explicit modeling of decision order substantially improves both sequential decision efficiency and cooperative quality.

Technology Category

Application Category

📝 Abstract
Multi-agent reinforcement learning focuses on training the behaviors of multiple learning agents that coexist in a shared environment. Recently, MARL models, such as the Multi-Agent Transformer (MAT) and ACtion dEpendent deep Q-learning (ACE), have significantly improved performance by leveraging sequential decision-making processes. Although these models can enhance performance, they do not explicitly consider the importance of the order in which agents make decisions. In this paper, we propose an Agent Order of Action Decisions-MAT (AOAD-MAT), a novel MAT model that considers the order in which agents make decisions. The proposed model explicitly incorporates the sequence of action decisions into the learning process, allowing the model to learn and predict the optimal order of agent actions. The AOAD-MAT model leverages a Transformer-based actor-critic architecture that dynamically adjusts the sequence of agent actions. To achieve this, we introduce a novel MARL architecture that cooperates with a subtask focused on predicting the next agent to act, integrated into a Proximal Policy Optimization based loss function to synergistically maximize the advantage of the sequential decision-making. The proposed method was validated through extensive experiments on the StarCraft Multi-Agent Challenge and Multi-Agent MuJoCo benchmarks. The experimental results show that the proposed AOAD-MAT model outperforms existing MAT and other baseline models, demonstrating the effectiveness of adjusting the AOAD order in MARL.
Problem

Research questions and friction points this paper is trying to address.

Addresses sequential decision-making importance in multi-agent reinforcement learning
Proposes Transformer model dynamically adjusting agent action order
Enhances performance by predicting optimal agent decision sequence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based actor-critic architecture for multi-agent reinforcement learning
Explicitly incorporates action decision sequence into learning process
Integrates subtask predicting next agent with Proximal Policy Optimization