π€ AI Summary
This paper addresses the zero-shot collaboration (MT-ZSC) challenge among unfamiliar teams in multi-team systems (MTS), formally defining the problem for the first time and introducing the first scalable Overcooked benchmark supporting N-agent, multi-subqueue team structures. We propose N-XPlay, a novel multi-agent reinforcement learning training paradigm that extends self-play by integrating intra-team role abstraction and inter-team coordination optimization. Experiments on 2-, 3-, and 5-player Overcooked tasks demonstrate that N-XPlay significantly improves cross-team collaboration success rates with unseen teammates, outperforming standard self-play baselines. Our core contributions are: (1) establishing the formal problem definition and evaluation framework for MT-ZSC; (2) designing a hierarchical training mechanism that jointly accommodates structured team organization and zero-shot generalization; and (3) open-sourcing an extensible, multi-team collaborative benchmark and framework to foster reproducible research.
π Abstract
Zero-shot coordination (ZSC) -- the ability to collaborate with unfamiliar partners -- is essential to making autonomous agents effective teammates. Existing ZSC methods evaluate coordination capabilities between two agents who have not previously interacted. However, these scenarios do not reflect the complexity of real-world multi-agent systems, where coordination often involves a hierarchy of sub-groups and interactions between teams of agents, known as Multi-Team Systems (MTS). To address this gap, we first introduce N-player Overcooked, an N-agent extension of the popular two-agent ZSC benchmark, enabling evaluation of ZSC in N-agent scenarios. We then propose N-XPlay for ZSC in N-agent, multi-team settings. Comparison against Self-Play across two-, three- and five-player Overcooked scenarios, where agents are split between an ``ego-team'' and a group of unseen collaborators shows that agents trained with N-XPlay are better able to simultaneously balance ``intra-team'' and ``inter-team'' coordination than agents trained with SP.