🤖 AI Summary
Laparoscopic surgery suffers from limited spatial perception due to monocular, two-dimensional endoscopic vision, while conventional training simulators lack effective depth cues, increasing risks of instrument mislocalization and procedural errors. To address this, we propose an AI-enhanced mixed-reality training framework that achieves— for the first time—real-time fusion of AI-driven 3D visual feedback with live laparoscopic video. Implemented on NVIDIA Isaac Sim, the framework integrates precise instrument pose estimation, instrument–tissue interaction detection, and dynamic 3D visualization rendering, delivering real-time spatial guidance under standard clinical viewing angles. Experimental results demonstrate significant improvements in trainees’ ability to discriminate depth, tissue contact status, and instrument orientation. Crucially, the system enables reliable differentiation between visually similar yet spatially distinct surgical scenarios, thereby enhancing both training safety and efficiency.
📝 Abstract
Laparoscopic surgery constrains surgeons spatial awareness because procedures are performed through a monocular, two-dimensional (2D) endoscopic view. Conventional training methods using dry-lab models or recorded videos provide limited depth cues, often leading trainees to misjudge instrument position and perform ineffective or unsafe maneuvers. To address this limitation, we present an AI-assisted training framework developed in NVIDIA Isaac Sim that couples the standard 2D laparoscopic feed with synchronized three-dimensional (3D) visual feedback delivered through a mixed-reality (MR) interface. While trainees operate using the clinical 2D view, validated AI modules continuously localize surgical instruments and detect instrument-tissue interactions in the background. When spatial misjudgments are detected, 3D visual feedback are displayed to trainees, while preserving the original operative perspective. Our framework considers various surgical tasks including navigation, manipulation, transfer, cutting, and suturing. Visually similar 2D cases can be disambiguated through the added 3D context, improving depth perception, contact awareness, and tool orientation understanding.