π€ AI Summary
This work addresses the limitations of traditional imitation learning, which relies on open-loop data collection and fails to adequately cover regions where the policy is weak, while interactive methods like DAgger require physical robots, incurring high costs and poor scalability. The authors propose a novel robot-free, real-time policy iteration system that leverages smartphone-based augmented reality to visualize the policyβs predicted trajectories, enabling users to actively focus on failure-prone regions. Integrated with a remote inference framework and an asynchronous online fine-tuning pipeline, the system achieves closed-loop learning within minutes. This approach enables immersive, robot-free interactive data collection for the first time, requiring only minimal human corrections in a distributed setting. It demonstrates a twofold improvement in sample efficiency over offline policies and significantly outperforms existing data augmentation strategies.
π Abstract
Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy's weaknesses, leading to inefficient coverage of critical state distributions. Conversely, interactive methods like DAgger effectively address covariate shift but rely on physical robot execution, which is costly and difficult to scale. To reconcile this trade-off, we introduce RoboPocket, a portable system that enables Robot-Free Instant Policy Iteration using single consumer smartphones. Its core innovation is a Remote Inference framework that visualizes the policy's predicted trajectory via Augmented Reality (AR) Visual Foresight. This immersive feedback allows collectors to proactively identify potential failures and focus data collection on the policy's weak regions without requiring a physical robot. Furthermore, we implement an asynchronous Online Finetuning pipeline that continuously updates the policy with incoming data, effectively closing the learning loop in minutes. Extensive experiments demonstrate that RoboPocket adheres to data scaling laws and doubles the data efficiency compared to offline scaling strategies, overcoming their long-standing efficiency bottleneck. Moreover, our instant iteration loop also boosts sample efficiency by up to 2$\times$ in distributed environments a small number of interactive corrections per person. Project page and videos: https://robo-pocket.github.io.