🤖 AI Summary
This study investigates data scaling laws in robotic imitation learning, addressing whether a single policy can achieve zero-shot generalization to unseen environments and objects of the same category using limited yet high-quality demonstration data. We propose a “diversity-first, demonstration-count-threshold-driven” data collection paradigm for efficient real-world robot data acquisition. We empirically uncover, for the first time, a power-law relationship between the number of training environments/objects and zero-shot performance. Evaluation is conducted under a rigorous, realistic cross-environment and cross-object benchmark using behavior cloning. With only half-a-day of real-robot demonstrations collected by four human operators—totaling ~4 hours—we achieve approximately 90% success rates on two distinct manipulation tasks across unseen environments and unseen objects. This demonstrates substantial improvements in data efficiency and generalization robustness, establishing scalable, practical foundations for deploying imitation learning in diverse real-world settings.
📝 Abstract
Data scaling has revolutionized fields like natural language processing and computer vision, providing models with remarkable generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in robotics, particularly in robotic manipulation, and whether appropriate data scaling can yield single-task robot policies that can be deployed zero-shot for any object within the same category in any environment. To this end, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy's generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. The diversity of environments and objects is far more important than the absolute number of demonstrations; once the number of demonstrations per environment or object reaches a certain threshold, additional demonstrations have minimal effect. Based on these insights, we propose an efficient data collection strategy. With four data collectors working for one afternoon, we collect sufficient data to enable the policies for two tasks to achieve approximately 90% success rates in novel environments with unseen objects.