🤖 AI Summary
This work addresses the challenge of cross-task and cross-environment generalization in multi-task reinforcement learning. We propose a unified training framework based on goal-conditioned policies and instantiate it in Neural MMO—a high-complexity, open-world multi-agent environment. Methodologically, we decouple goal representation from the policy network and jointly optimize generalization across three previously unseen dimensions: tasks, maps, and opponent policies. Experiments demonstrate that our best configuration achieves four times the baseline score within eight hours on a single GPU, while significantly improving zero-shot transfer performance. To foster reproducibility and community advancement, we open-source all code and model weights; the project has already attracted over 200 researchers. This work establishes a rigorous, reproducible benchmark and a principled technical paradigm for generalization in open-world multi-agent reinforcement learning.
📝 Abstract
We present the results of the NeurIPS 2023 Neural MMO Competition, which attracted over 200 participants and submissions. Participants trained goal-conditional policies that generalize to tasks, maps, and opponents never seen during training. The top solution achieved a score 4x higher than our baseline within 8 hours of training on a single 4090 GPU. We open-source everything relating to Neural MMO and the competition under the MIT license, including the policy weights and training code for our baseline and for the top submissions.