PlayWorld: Learning Robot World Models from Autonomous Play

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

This work addresses the challenge of generating physically consistent interaction predictions in contact-rich manipulation tasks, where existing robotic video world models often fall short. We propose the first fully autonomous self-play learning framework that trains a high-fidelity action-conditioned video generation model exclusively from unsupervised robot self-interaction data, without any human demonstrations. Our approach achieves, for the first time, world model training through purely autonomous exploration, effectively capturing long-tailed and complex physical dynamics while enabling policy evaluation and reinforcement learning. Experiments across multiple manipulation tasks demonstrate significant improvements in both prediction fidelity and policy performance: error rates in failure prediction and policy evaluation accuracy improve by up to 40%, and real-world policy success rates increase by 65%.

Technology Category

Application Category

📝 Abstract

Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data.We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.

Problem

Research questions and friction points this paper is trying to address.

robot world models

physically consistent interactions

robot-object interaction

video prediction

manipulation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

autonomous self-play

video world models

robotic manipulation