🤖 AI Summary
Autonomous intraocular foreign body removal faces challenges including difficulty in autonomous manipulation, inconsistent motion scaling, and control uncertainty due to dynamic remote center of motion (RCM) drift. Method: This paper proposes RCM-ACT—a novel architecture integrating binocular vision and kinematic data to construct a chunked-action Transformer model, embedded with real-time dynamic RCM calibration. It enables end-to-end autonomous operation using only uncalibrated microsurgical video and instrument pose, without depth sensors or system pre-calibration. Training employs imitation learning on expert demonstrations to acquire a sequential grasp-and-localize policy. Results: The method achieves fully automated grasping and precise localization of ring-shaped foreign bodies in a biomimetic eye model, demonstrating effectiveness and robustness under low-precision perception and dynamically drifting RCM conditions—marking the first such end-to-end autonomous solution in this domain.
📝 Abstract
Intraocular foreign body removal demands millimeter-level precision in confined intraocular spaces, yet existing robotic systems predominantly rely on manual teleoperation with steep learning curves. To address the challenges of autonomous manipulation (particularly kinematic uncertainties from variable motion scaling and variation of the Remote Center of Motion (RCM) point), we propose AutoRing, an imitation learning framework for autonomous intraocular foreign body ring manipulation. Our approach integrates dynamic RCM calibration to resolve coordinate-system inconsistencies caused by intraocular instrument variation and introduces the RCM-ACT architecture, which combines action-chunking transformers with real-time kinematic realignment. Trained solely on stereo visual data and instrument kinematics from expert demonstrations in a biomimetic eye model, AutoRing successfully completes ring grasping and positioning tasks without explicit depth sensing. Experimental validation demonstrates end-to-end autonomy under uncalibrated microscopy conditions. The results provide a viable framework for developing intelligent eye-surgical systems capable of complex intraocular procedures.