🤖 AI Summary
High-quality egocentric data for surgical robot learning is scarce. Method: This paper introduces a novel data acquisition platform integrating augmented reality (AR) and high-fidelity physics simulation. It proposes the first AR-driven, phantom-free egocentric motion capture paradigm—leveraging real-time hand tracking, full-body kinematic modeling, and overlay of virtual surgical environments—eliminating dependence on physical da Vinci systems. The platform incorporates egocentric perception modeling and lightweight data encoding to enable scalable, multi-view, resource-efficient dataset generation. Results: Experiments demonstrate a 41% increase in data throughput, 10% reduction in per-trial duration, 400× decrease in storage footprint, and 100% improvement in sampling frequency—all statistically significant (p < 0.01). This work establishes a scalable, high-fidelity, low-cost data infrastructure for imitation and reinforcement learning in safety-critical robotic applications.
📝 Abstract
Data scarcity has long been an issue in the robot learning community. Particularly, in safety-critical domains like surgical applications, obtaining high-quality data can be especially difficult. It poses challenges to researchers seeking to exploit recent advancements in reinforcement learning and imitation learning, which have greatly improved generalizability and enabled robots to conduct tasks autonomously. We introduce dARt Vinci, a scalable data collection platform for robot learning in surgical settings. The system uses Augmented Reality (AR) hand tracking and a high-fidelity physics engine to capture subtle maneuvers in primitive surgical tasks: By eliminating the need for a physical robot setup and providing flexibility in terms of time, space, and hardware resources-such as multiview sensors and actuators-specialized simulation is a viable alternative. At the same time, AR allows the robot data collection to be more egocentric, supported by its body tracking and content overlaying capabilities. Our user study confirms the proposed system's efficiency and usability, where we use widely-used primitive tasks for training teleoperation with da Vinci surgical robots. Data throughput improves across all tasks compared to real robot settings by 41% on average. The total experiment time is reduced by an average of 10%. The temporal demand in the task load survey is improved. These gains are statistically significant. Additionally, the collected data is over 400 times smaller in size, requiring far less storage while achieving double the frequency.