🤖 AI Summary
This work addresses the limited generalization capability of Vision-Language-Action (VLA) models in real-world, open domestic environments. To this end, we propose a heterogeneous task-cooperative training paradigm. Methodologically, we integrate multi-robot platform data, web-sourced semantic resources, and hierarchical action representations to construct a hybrid multimodal joint training framework that unifies object detection, semantic subtask prediction, and low-level action generation. We further design a cross-platform VLA architecture enabling zero-shot transfer. Our approach achieves, for the first time, successful long-horizon, dexterous manipulation tasks—such as kitchen and bedroom cleaning—in entirely unseen household scenes. This yields substantial improvements in open-world generalization. The framework advances end-to-end embodied intelligence toward practical deployment by bridging the gap between simulation-trained models and real-world adaptability.
📝 Abstract
In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $pi_{0.5}$, a new model based on $pi_{0}$ that uses co-training on heterogeneous tasks to enable broad generalization. $pi_{0.5}$ uses data from multiple robots, high-level semantic prediction, web data, and other sources to enable broadly generalizable real-world robotic manipulation. Our system uses a combination of co-training and hybrid multi-modal examples that combine image observations, language commands, object detections, semantic subtask prediction, and low-level actions. Our experiments show that this kind of knowledge transfer is essential for effective generalization, and we demonstrate for the first time that an end-to-end learning-enabled robotic system can perform long-horizon and dexterous manipulation skills, such as cleaning a kitchen or bedroom, in entirely new homes.