LightEndoStereo: A Real-time Lightweight Stereo Matching Method for Endoscopy Images

📅 2025-03-02

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

In minimally invasive surgical robotics, high-resolution endoscopic images suffer from ambiguous tissue boundaries and pose significant challenges for real-time depth estimation. To address these issues, this paper proposes a lightweight, real-time stereo matching method. Our approach features: (1) a novel 3D Mamba Coordinate Attention module that effectively models long-range spatial dependencies; and (2) a wavelet-domain high-frequency disparity refinement module leveraging discrete wavelet transform (DWT) to precisely recover blurred tissue boundaries. The method employs a compact encoder-decoder architecture integrating 3D Mamba with coordinate attention. Evaluated on the SCARED and SERV-CT datasets, it achieves state-of-the-art accuracy while maintaining an inference speed of 42 FPS—meeting stringent clinical real-time requirements.

Technology Category

Application Category

📝 Abstract

Real-time acquisition of accurate depth of scene is essential for automated robotic minimally invasive surgery, and stereo matching with binocular endoscopy can generate such depth. However, existing algorithms struggle with ambiguous tissue boundaries and real-time performance in prevalent high-resolution endoscopic scenes. We propose LightEndoStereo, a lightweight real-time stereo matching method for endoscopic images. We introduce a 3D Mamba Coordinate Attention module to streamline the cost aggregation process by generating position-sensitive attention maps and capturing long-range dependencies across spatial dimensions using the Mamba block. Additionally, we introduce a High-Frequency Disparity Optimization module to refine disparity estimates at tissue boundaries by enhancing high-frequency information in the wavelet domain. Our method is evaluated on the SCARED and SERV-CT datasets, achieving state-of-the-art matching accuracy and a real-time inference speed of 42 FPS. The code is available at https://github.com/Sonne-Ding/LightEndoStereo.

Problem

Research questions and friction points this paper is trying to address.

Real-time depth acquisition for robotic surgery

Handling ambiguous tissue boundaries in endoscopy

Achieving high accuracy and speed in stereo matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Mamba Coordinate Attention module

High-Frequency Disparity Optimization module

Real-time inference speed of 42 FPS

🔎 Similar Papers

EndoPerfect: A Hybrid NeRF-Stereo Vision Approach Pioneering Monocular Depth Estimation and 3D Reconstruction in Endoscopy