PlayWorld: Learning Robot World Models from Autonomous Play

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generating physically consistent interaction predictions in contact-rich manipulation tasks, where existing robotic video world models often fall short. We propose the first fully autonomous self-play learning framework that trains a high-fidelity action-conditioned video generation model exclusively from unsupervised robot self-interaction data, without any human demonstrations. Our approach achieves, for the first time, world model training through purely autonomous exploration, effectively capturing long-tailed and complex physical dynamics while enabling policy evaluation and reinforcement learning. Experiments across multiple manipulation tasks demonstrate significant improvements in both prediction fidelity and policy performance: error rates in failure prediction and policy evaluation accuracy improve by up to 40%, and real-world policy success rates increase by 65%.

Technology Category

Application Category

📝 Abstract
Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data.We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.
Problem

Research questions and friction points this paper is trying to address.

robot world models
physically consistent interactions
robot-object interaction
video prediction
manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

autonomous self-play
video world models
robotic manipulation
unsupervised learning
physical consistency
🔎 Similar Papers
No similar papers found.
Tenny Yin
Tenny Yin
Princeton University
RoboticsMachine Learning
Zhiting Mei
Zhiting Mei
PhD Student, Princeton University
Robotics
Z
Zhonghe Zheng
Princeton University
M
Miyu Yamane
Princeton University
D
David Wang
Princeton University
J
Jade Sceats
Princeton University
S
Samuel M. Bateman
Princeton University
Lihan Zha
Lihan Zha
Princeton University
Robotics
A
Apurva Badithela
Princeton University
O
Ola Shorinwa
Princeton University
Anirudha Majumdar
Anirudha Majumdar
Associate Professor, Princeton University & Visiting Research Scientist, Google DeepMind
RoboticsMachine LearningMotion PlanningControl