🤖 AI Summary
In multi-agent reinforcement learning (MARL), decentralized execution under partial observability hinders effective coordination. To address this, we propose the Factorized Multi-Agent Transformer (f-MAT), which models local agent interactions as a dynamic graph structure. f-MAT introduces two novel mechanisms: factorized grouping—partitioning agents into task-relevant subsets—and overlapping graph attention—enabling scalable, context-aware message passing across overlapping neighborhoods. Crucially, it operates under decentralized execution constraints while supporting centralized training, fully compatible with the CTDE paradigm. The architecture enables efficient graph-based communication and parallel decision-making during both training and deployment. Evaluated on networked control tasks—including traffic signal coordination and wireless power control—f-MAT consistently outperforms state-of-the-art baselines in performance, generalization across topologies, and system scalability, demonstrating robustness to varying agent counts and graph dynamics.
📝 Abstract
In multi-agent reinforcement learning, a commonly considered paradigm is centralized training with decentralized execution. However, in this framework, decentralized execution restricts the development of coordinated policies due to the local observation limitation. In this paper, we consider the cooperation among neighboring agents during execution and formulate their interactions as a graph. Thus, we introduce a novel encoder-decoder architecture named Factor-based Multi-Agent Transformer ($f$-MAT) that utilizes a transformer to enable communication between neighboring agents during both training and execution. By dividing agents into different overlapping groups and representing each group with a factor, $f$-MAT achieves efficient message passing and parallel action generation through factor-based attention layers. Empirical results in networked systems such as traffic scheduling and power control demonstrate that $f$-MAT achieves superior performance compared to strong baselines, thereby paving the way for handling complex collaborative problems.