UniZero: Generalized and Efficient Planning with Scalable Latent World Models

📅 2024-06-15
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor scalability of world models and inefficient long-horizon planning in heterogeneous multi-task environments, this paper proposes a modular Transformer-based unified latent world model. The model jointly models dynamics prediction and decision-oriented representation within a shared latent space, integrating latent-space Monte Carlo Tree Search (MCTS) with the value-equivalence principle to unify long-term memory modeling and scalable multi-task planning. Its core innovation is a novel co-prediction architecture that overcomes the limitations of MuZero-style methods in handling task diversity and heterogeneous dependencies. Experiments demonstrate that the method significantly outperforms existing approaches on benchmarks requiring long-term memory; achieves superior scalability in Atari multi-task learning; and matches or exceeds state-of-the-art performance on single-task Atari and DeepMind Control Suite benchmarks.

Technology Category

Application Category

📝 Abstract
Learning predictive world models is crucial for enhancing the planning capabilities of reinforcement learning (RL) agents. Recently, MuZero-style algorithms, leveraging the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, these methods struggle to scale in heterogeneous scenarios with diverse dependencies and task variability. To overcome these limitations, we introduce UniZero, a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space. By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in the latent space. We show that UniZero significantly outperforms existing baselines in benchmarks that require long-term memory. Additionally, UniZero demonstrates superior scalability in multitask learning experiments conducted on Atari benchmarks. In standard single-task RL settings, such as Atari and DMControl, UniZero matches or even surpasses the performance of current state-of-the-art methods. Finally, extensive ablation studies and visual analyses validate the effectiveness and scalability of UniZero's design choices. Our code is available at extcolor{magenta}{https://github.com/opendilab/LightZero}.
Problem

Research questions and friction points this paper is trying to address.

AI planning
long-term memory
multi-task learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shared Scalable Hidden Map Model
Long-term Memory Optimization
Multi-task Learning
🔎 Similar Papers
No similar papers found.
Y
Yuan Pu
Shanghai Artificial Intelligence Laboratory
Y
Yazhe Niu
SenseTime Research, The Chinese University of Hong Kong
J
Jiyuan Ren
Shanghai Artificial Intelligence Laboratory
Zhenjie Yang
Zhenjie Yang
Tsinghua University
Networking
H
Hongsheng Li
The Chinese University of Hong Kong
Y
Yu Liu
Shanghai Artificial Intelligence Laboratory, SenseTime Research