π€ AI Summary
This paper addresses the challenge of zero-shot generalization for a single agent operating in multi-agent systems with unknown teammates and opponents. To tackle this, we propose the Adaptive Collaboration-Competition Assessment (ACCA) paradigmβa novel evaluation framework supporting cross-task, cross-environment, and cross-interaction zero-shot transfer. Methodologically, we introduce the Multi-Retrieval Dynamic Generation (MRDG) architecture, which integrates variable-scale team positional encoding, a lightweight hypernetwork, and a cross-agent viewpoint alignment module to enable unified modeling and dynamic response to heterogeneous partner behaviors. Evaluated on SMAC, Overcooked-AI, and Melting Pot benchmarks, ACCA achieves significant improvements in collaborative robustness and competitive win rates, consistently outperforming state-of-the-art zero-shot MARL and ad-hoc teamwork baselines. Our work establishes a scalable, principled assessment framework and modeling paradigm for generalization in multi-agent reinforcement learning.
π Abstract
Adapting a single agent to a new multi-agent system brings challenges, necessitating adjustments across various tasks, environments, and interactions with unknown teammates and opponents. Addressing this challenge is highly complex, and researchers have proposed two simplified scenarios, Multi-agent reinforcement learning for zero-shot learning and Ad-Hoc Teamwork. Building on these foundations, we propose a more comprehensive setting, Agent Collaborative-Competitive Adaptation (ACCA), which evaluates an agent to generalize across diverse scenarios, tasks, and interactions with both unfamiliar opponents and teammates. In ACCA, agents adjust to task and environmental changes, collaborate with unseen teammates, and compete against unknown opponents. We introduce a new modeling approach, Multi-Retrieval and Dynamic Generation (MRDG), that effectively models both teammates and opponents using their behavioral trajectories. This method incorporates a positional encoder for varying team sizes and a hypernetwork module to boost agents'learning and adaptive capabilities. Additionally, a viewpoint alignment module harmonizes the observational perspectives of retrieved teammates and opponents with the learning agent. Extensive tests in benchmark scenarios like SMAC, Overcooked-AI, and Melting Pot show that MRDG significantly improves robust collaboration and competition with unseen teammates and opponents, surpassing established baselines. Our code is available at: https://github.com/vcis-wangchenxu/MRDG.git