🤖 AI Summary
Mobile robots face significant challenges in real-world deployment, including low data efficiency, difficulty in reward engineering, and the sim-to-real gap. To address these, we propose PVP4Real: an end-to-end, online human-robot collaborative learning framework that requires no pretraining, no handcrafted reward function, and no large-scale offline datasets. It integrates online imitation learning with policy-gradient-based reinforcement learning, training policies exclusively from real-time human interventions and demonstrations. Our key contributions include: (i) the first online collaborative learning paradigm fully eliminating pretraining, explicit reward design, and offline data dependency; (ii) explicit modeling of human intervention signals to enable real-time policy adaptation; and (iii) hardware-agnostic compatibility with legged and wheeled platforms using raw RGB-D inputs. Evaluated on two physical robot platforms, PVP4Real achieves task proficiency within 15 minutes and successfully deploys in complex real-world environments—demonstrating exceptional generalization and safe adaptation under extreme data scarcity.
📝 Abstract
Mobile robots are essential in applications such as autonomous delivery and hospitality services. Applying learning-based methods to address mobile robot tasks has gained popularity due to its robustness and generalizability. Traditional methods such as Imitation Learning (IL) and Reinforcement Learning (RL) offer adaptability but require large datasets, carefully crafted reward functions, and face sim-to-real gaps, making them challenging for efficient and safe real-world deployment. We propose an online human-in-the-loop learning method PVP4Real that combines IL and RL to address these issues. PVP4Real enables efficient real-time policy learning from online human intervention and demonstration, without reward or any pretraining, significantly improving data efficiency and training safety. We validate our method by training two different robots -- a legged quadruped, and a wheeled delivery robot -- in two mobile robot tasks, one of which even uses raw RGBD image as observation. The training finishes within 15 minutes. Our experiments show the promising future of human-in-the-loop learning in addressing the data efficiency issue in real-world robotic tasks. More information is available at: https://metadriverse.github.io/pvp4real/