🤖 AI Summary
To address the limited task generalization and challenges in online multi-task learning for continuous control in reinforcement learning, this paper introduces Newt: a language-conditioned multi-task world model. Methodologically, we construct a diverse language-vision benchmark comprising 200 tasks, and integrate large-scale demonstration-based pretraining—learning task representations and action priors—with online end-to-end reinforcement learning to jointly optimize the world model’s open-loop control capability. Our core contributions are a language-driven cross-task representation sharing mechanism and a lightweight architecture enabling efficient online adaptation. Experiments demonstrate that Newt achieves significant improvements in data efficiency and zero-shot transfer performance, outperforming strong baselines across multi-task benchmarks. To foster reproducibility and community advancement, we fully release the environment, demonstration datasets, and over 200 trained checkpoints.
📝 Abstract
General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimes, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining followed by light RL) we ask whether a single agent can be trained on hundreds of tasks with online interaction. To accelerate research in this direction, we introduce a new benchmark with 200 diverse tasks spanning many domains and embodiments, each with language instructions, demonstrations, and optionally image observations. We then present emph{Newt}, a language-conditioned multitask world model that is first pretrained on demonstrations to acquire task-aware representations and action priors, and then jointly optimized with online interaction across all tasks. Experiments show that Newt yields better multitask performance and data-efficiency than a set of strong baselines, exhibits strong open-loop control, and enables rapid adaptation to unseen tasks. We release our environments, demonstrations, code for training and evaluation, as well as 200+ checkpoints.