Results of the NeurIPS 2023 Neural MMO Competition on Multi-task Reinforcement Learning

📅 2025-08-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of cross-task and cross-environment generalization in multi-task reinforcement learning. We propose a unified training framework based on goal-conditioned policies and instantiate it in Neural MMO—a high-complexity, open-world multi-agent environment. Methodologically, we decouple goal representation from the policy network and jointly optimize generalization across three previously unseen dimensions: tasks, maps, and opponent policies. Experiments demonstrate that our best configuration achieves four times the baseline score within eight hours on a single GPU, while significantly improving zero-shot transfer performance. To foster reproducibility and community advancement, we open-source all code and model weights; the project has already attracted over 200 researchers. This work establishes a rigorous, reproducible benchmark and a principled technical paradigm for generalization in open-world multi-agent reinforcement learning.

Technology Category

Application Category

📝 Abstract
We present the results of the NeurIPS 2023 Neural MMO Competition, which attracted over 200 participants and submissions. Participants trained goal-conditional policies that generalize to tasks, maps, and opponents never seen during training. The top solution achieved a score 4x higher than our baseline within 8 hours of training on a single 4090 GPU. We open-source everything relating to Neural MMO and the competition under the MIT license, including the policy weights and training code for our baseline and for the top submissions.
Problem

Research questions and friction points this paper is trying to address.

Generalizing goal-conditional policies to unseen tasks
Training agents that adapt to new maps and opponents
Solving multi-task reinforcement learning competition challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Goal-conditional policies for generalization
Single GPU training with high efficiency
Open-source competition framework and solutions
🔎 Similar Papers
No similar papers found.
J
Joseph Suárez
Massachusetts Institute of Technology
K
Kyoung Whan Choe
CarperAI
D
David Bloomin
Plurality Institute
J
Jianming Gao
Competition Winners
Y
Yunkun Li
Competition Winners
Y
Yao Feng
Competition Winners
S
Saidinesh Pola
Competition Winners
K
Kun Zhang
Competition Winners
Y
Yonghui Zhu
Competition Winners
Nikhil Pinnaparaju
Nikhil Pinnaparaju
CarperAI
H
Hao Xiang Li
CarperAI
N
Nishaanth Kanna
CarperAI
D
Daniel Scott
CarperAI
R
Ryan Sullivan
University of Maryland, College Park
R
Rose S. Shuman
CarperAI
L
Lucas de Alcântara
CarperAI
H
Herbie Bradley
CarperAI
K
Kirsty You
Parametrix.AI
B
Bo Wu
Parametrix.AI
Yuhao Jiang
Yuhao Jiang
Postdoc Researcher, EPFL
Soft RoboticsMechanism DesignDynamic ModelingControls
Q
Qimai Li
Parametrix.AI
J
Jiaxin Chen
Parametrix.AI
Louis Castricato
Louis Castricato
CEO Wayfarer Labs
Reinforcement LearningRLHFLLMsAI Safety
X
Xiaolong Zhu
Parametrix.AI
Phillip Isola
Phillip Isola
Associate Professor, MIT
Computer VisionMachine LearningAICognitive Science