Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of low sample efficiency in multi-task reinforcement learning within visual domains, where tasks exhibit substantial differences in observations and dynamics. The authors propose a scalable architecture that, for the first time, integrates a mixture-of-experts mechanism with gradient-based task clustering into a world model. The approach employs a modular variational autoencoder for task-adaptive visual compression and a hybrid Transformer dynamics model combining task-conditioned experts with a shared backbone, enabling efficient parameter sharing and task specialization within a unified framework. Evaluated on Atari 100k, the single-model solution achieves 110.4% human-normalized score—approaching the performance of an ensemble of 26 specialized models while using 50% fewer parameters. On Meta-World, it attains a 74.5% success rate within 300k environment steps, establishing a new state of the art.

Technology Category

Application Category

📝 Abstract
A fundamental challenge in multi-task reinforcement learning (MTRL) is achieving sample efficiency in visual domains where tasks exhibit substantial heterogeneity in both observations and dynamics. Model-based reinforcement learning offers a promising path to improved sample efficiency through world models, but standard monolithic architectures struggle to capture diverse task dynamics, resulting in poor reconstruction and prediction accuracy. We introduce Mixture-of-World Models (MoW), a scalable architecture that combines modular variational autoencoders for task-adaptive visual compression, a hybrid Transformer-based dynamics model with task-conditioned experts and a shared backbone, and a gradient-based task clustering strategy for efficient parameter allocation. On the Atari 100k benchmark, a single MoW agent trained once on 26 Atari games achieves a mean human-normalized score of 110.4%, competitive with the score of 114.2% achieved by STORM, an ensemble of 26 task-specific models, while using 50% fewer parameters. On Meta-World, MoW achieves a 74.5% average success rate within 300 thousand environment steps, establishing a new state of the art. These results demonstrate that MoW provides a scalable and parameter-efficient foundation for generalist world models.
Problem

Research questions and friction points this paper is trying to address.

multi-task reinforcement learning
sample efficiency
visual domains
heterogeneous dynamics
world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-World Models
multi-task reinforcement learning
modular latent dynamics
task-conditioned experts
parameter-efficient world models
🔎 Similar Papers
No similar papers found.
B
Boxuan Zhang
School of Automation, Beijing Institute of Technology
W
Weipu Zhang
Jiangxing Intelligence Inc.
Zhaohan Feng
Zhaohan Feng
Ph.D. Candidate at Beijing Institute of Technology
Reinforcement LearningEmbodied AIMulti-agent Systems
W
Wei Xiao
School of Automation, Beijing Institute of Technology
Jian Sun
Jian Sun
Beijing Institute of Technology
Networked control systemstime-delay systemsSecurity of CPS
J
Jie Chen
School of Automation, Beijing Institute of Technology
Gang Wang
Gang Wang
Beijing Institute of Technology
Distributed learningnon-convex optimizationreinforcement learningdata-driven control