π€ AI Summary
In minimally invasive surgical robotics, high-resolution endoscopic images suffer from ambiguous tissue boundaries and pose significant challenges for real-time depth estimation. To address these issues, this paper proposes a lightweight, real-time stereo matching method. Our approach features: (1) a novel 3D Mamba Coordinate Attention module that effectively models long-range spatial dependencies; and (2) a wavelet-domain high-frequency disparity refinement module leveraging discrete wavelet transform (DWT) to precisely recover blurred tissue boundaries. The method employs a compact encoder-decoder architecture integrating 3D Mamba with coordinate attention. Evaluated on the SCARED and SERV-CT datasets, it achieves state-of-the-art accuracy while maintaining an inference speed of 42 FPSβmeeting stringent clinical real-time requirements.
π Abstract
Real-time acquisition of accurate depth of scene is essential for automated robotic minimally invasive surgery, and stereo matching with binocular endoscopy can generate such depth. However, existing algorithms struggle with ambiguous tissue boundaries and real-time performance in prevalent high-resolution endoscopic scenes. We propose LightEndoStereo, a lightweight real-time stereo matching method for endoscopic images. We introduce a 3D Mamba Coordinate Attention module to streamline the cost aggregation process by generating position-sensitive attention maps and capturing long-range dependencies across spatial dimensions using the Mamba block. Additionally, we introduce a High-Frequency Disparity Optimization module to refine disparity estimates at tissue boundaries by enhancing high-frequency information in the wavelet domain. Our method is evaluated on the SCARED and SERV-CT datasets, achieving state-of-the-art matching accuracy and a real-time inference speed of 42 FPS. The code is available at https://github.com/Sonne-Ding/LightEndoStereo.