🤖 AI Summary
This work addresses the challenge of generating physically consistent interaction predictions in contact-rich manipulation tasks, where existing robotic video world models often fall short. We propose the first fully autonomous self-play learning framework that trains a high-fidelity action-conditioned video generation model exclusively from unsupervised robot self-interaction data, without any human demonstrations. Our approach achieves, for the first time, world model training through purely autonomous exploration, effectively capturing long-tailed and complex physical dynamics while enabling policy evaluation and reinforcement learning. Experiments across multiple manipulation tasks demonstrate significant improvements in both prediction fidelity and policy performance: error rates in failure prediction and policy evaluation accuracy improve by up to 40%, and real-world policy success rates increase by 65%.
📝 Abstract
Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data.We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.