Learning Massively Multitask World Models for Continuous Control

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

To address the limited task generalization and challenges in online multi-task learning for continuous control in reinforcement learning, this paper introduces Newt: a language-conditioned multi-task world model. Methodologically, we construct a diverse language-vision benchmark comprising 200 tasks, and integrate large-scale demonstration-based pretraining—learning task representations and action priors—with online end-to-end reinforcement learning to jointly optimize the world model’s open-loop control capability. Our core contributions are a language-driven cross-task representation sharing mechanism and a lightweight architecture enabling efficient online adaptation. Experiments demonstrate that Newt achieves significant improvements in data efficiency and zero-shot transfer performance, outperforming strong baselines across multi-task benchmarks. To foster reproducibility and community advancement, we fully release the environment, demonstration datasets, and over 200 trained checkpoints.

Technology Category

Application Category

📝 Abstract

General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimes, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining followed by light RL) we ask whether a single agent can be trained on hundreds of tasks with online interaction. To accelerate research in this direction, we introduce a new benchmark with 200 diverse tasks spanning many domains and embodiments, each with language instructions, demonstrations, and optionally image observations. We then present emph{Newt}, a language-conditioned multitask world model that is first pretrained on demonstrations to acquire task-aware representations and action priors, and then jointly optimized with online interaction across all tasks. Experiments show that Newt yields better multitask performance and data-efficiency than a set of strong baselines, exhibits strong open-loop control, and enables rapid adaptation to unseen tasks. We release our environments, demonstrations, code for training and evaluation, as well as 200+ checkpoints.

Problem

Research questions and friction points this paper is trying to address.

Developing general-purpose control agents for diverse tasks and embodiments

Scaling online reinforcement learning beyond single-task or offline regimes

Creating multitask world models that enable rapid adaptation to unseen tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretrained multitask world model using language conditioning

Joint optimization with online interaction across tasks

Demonstration-based task-aware representations and action priors

🔎 Similar Papers

PWM: Policy Learning with Multi-Task World Models