🤖 AI Summary
In real-world robotic policy training, expert demonstration data is costly to collect, exhibits low diversity, and scales poorly. To address these challenges, this paper proposes a scalable “rewind-and-refine” data collection paradigm. Our method integrates state rollback, real-time task success prediction, human-in-the-loop correction, and multi-robot parallel scheduling, augmented by a Task Sentinel module that enables autonomous intervention—facilitating lightweight, on-demand supervision by a single human operator coordinating multiple robots. Compared to conventional approaches, our framework improves task success rate by 40%, achieves state-of-the-art performance using less than 50% of the demonstration data, and significantly enhances per-human data productivity. To the best of our knowledge, this is the first approach enabling large-scale acquisition of expert demonstrations that simultaneously satisfies high fidelity, low cost, and high diversity.
📝 Abstract
While Vision-Language-Action (VLA) models show strong generalizability in various tasks, real-world deployment of robotic policy still requires large-scale, high-quality human expert demonstrations. However, passive data collection via human teleoperation is costly, hard to scale, and often biased toward passive demonstrations with limited diversity. To address this, we propose Genie Centurion (GCENT), a scalable and general data collection paradigm based on human rewind-and-refine guidance. When the robot execution failures occur, GCENT enables the system revert to a previous state with a rewind mechanism, after which a teleoperator provides corrective demonstrations to refine the policy. This framework supports a one-human-to-many-robots supervision scheme with a Task Sentinel module, which autonomously predicts task success and solicits human intervention when necessary, enabling scalable supervision. Empirical results show that GCENT achieves up to 40% higher task success rates than state-of-the-art data collection methods, and reaches comparable performance using less than half the data. We also quantify the data yield-to-effort ratio under multi-robot scenarios, demonstrating GCENT's potential for scalable and cost-efficient robot policy training in real-world environments.