AdaWorld: Learning Adaptable World Models with Latent Actions

📅 2025-03-24
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Existing world models rely heavily on large-scale labeled action data and computationally expensive training, hindering rapid adaptation to novel environments with heterogeneous action spaces and scarce annotations. To address this, we propose a self-supervised framework that eliminates the need for explicit action labels: first, video representation learning implicitly extracts action representations from inter-frame dynamics; second, an autoregressive world model is constructed conditioned on these latent actions. This constitutes the first approach to integrate action modeling directly into the world model pretraining stage, enabling action-agnostic universal representation learning. Our method achieves cross-action-space transfer with only minimal environment interaction. Extensive experiments across multiple environments demonstrate substantial improvements in video prediction fidelity and visual planning performance, reduce fine-tuning costs by over 40%, and exhibit strong generalization across diverse action spaces.

Technology Category

Application Category

📝 Abstract
World models aim to learn action-controlled future prediction and have proven essential for the development of intelligent agents. However, most existing world models rely heavily on substantial action-labeled data and costly training, making it challenging to adapt to novel environments with heterogeneous actions through limited interactions. This limitation can hinder their applicability across broader domains. To overcome this limitation, we propose AdaWorld, an innovative world model learning approach that enables efficient adaptation. The key idea is to incorporate action information during the pretraining of world models. This is achieved by extracting latent actions from videos in a self-supervised manner, capturing the most critical transitions between frames. We then develop an autoregressive world model that conditions on these latent actions. This learning paradigm enables highly adaptable world models, facilitating efficient transfer and learning of new actions even with limited interactions and finetuning. Our comprehensive experiments across multiple environments demonstrate that AdaWorld achieves superior performance in both simulation quality and visual planning.
Problem

Research questions and friction points this paper is trying to address.

Learning adaptable world models with latent actions
Reducing reliance on action-labeled data and costly training
Enabling efficient adaptation to novel environments with limited interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised latent action extraction from videos
Autoregressive world model conditioned on latent actions
Efficient adaptation with limited interactions and finetuning
🔎 Similar Papers
No similar papers found.