🤖 AI Summary
This paper addresses the N-Agent Heterogeneous Teamwork (NAHT) problem in multi-agent reinforcement learning—efficient collaboration among agents with no prior coordination, partial observability, and dynamically unknown numbers and types of teammates. We propose the first centralized Transformer-based NAHT framework, departing from independent learning paradigms by directly modeling cross-agent historical interaction sequences. Integrating centralized training with decentralized execution (CTDE) and multi-agent POMDP formalism, our approach achieves strong generalization without auxiliary tasks or explicit role inference. Evaluated on the StarCraft II benchmark, it improves sample efficiency by 37% over POAM and increases collaboration accuracy on unseen teammate types by 21%. This work provides the first empirical validation of sequential centralized modeling for dynamic, heterogeneous team coordination, demonstrating both effectiveness and superior generalization in open-team collaborative settings.
📝 Abstract
N-agent ad hoc teamwork (NAHT) is a newly introduced challenge in multi-agent reinforcement learning, where controlled subteams of varying sizes must dynamically collaborate with varying numbers and types of unknown teammates without pre-coordination. The existing learning algorithm (POAM) considers only independent learning for its flexibility in dealing with a changing number of agents. However, independent learning fails to fully capture the inter-agent dynamics essential for effective collaboration. Based on our observation that transformers deal effectively with sequences with varying lengths and have been shown to be highly effective for a variety of machine learning problems, this work introduces a centralized, transformer-based method for N-agent ad hoc teamwork. Our proposed approach incorporates historical observations and actions of all controlled agents, enabling optimal responses to diverse and unseen teammates in partially observable environments. Empirical evaluation on a StarCraft II task demonstrates that MAT-NAHT outperforms POAM, achieving superior sample efficiency and generalization, without auxiliary agent-modeling objectives.