🤖 AI Summary
This study addresses the challenge of sparse and inaccurate self-supervised depth estimation in endoscopic surgery, caused by weak textures and varying illumination, which hinders reliable 3D reconstruction and instrument navigation. To overcome this limitation, the work proposes an end-to-end multimodal fusion framework that, for the first time, integrates diffusion models into endoscopic depth completion. The method jointly leverages endoscopic images, sparse depth maps, and their gradient features to generate dense, high-fidelity depth maps. Experiments on two public endoscopic datasets demonstrate that the proposed approach significantly outperforms existing methods in both depth accuracy and robustness, effectively mitigating visual artifacts in complex surgical scenes and providing reliable geometric perception for surgical navigation.
📝 Abstract
Accurate depth estimation plays a critical role in the navigation of endoscopic surgical robots, forming the foundation for 3D reconstruction and safe instrument guidance. Fine-tuning pretrained models heavily relies on endoscopic surgical datasets with precise depth annotations. While existing self-supervised depth estimation techniques eliminate the need for accurate depth annotations, their performance degrades in environments with weak textures and variable lighting, leading to sparse reconstruction with invalid depth estimation. Depth completion using sparse depth maps can mitigate these issues and improve accuracy. Despite the advances in depth completion techniques in general fields, their application in endoscopy remains limited. To overcome these limitations, we propose EndoDDC, an endoscopy depth completion method that integrates images, sparse depth information with depth gradient features, and optimizes depth maps through a diffusion model, addressing the issues of weak texture and light reflection in endoscopic environments. Extensive experiments on two publicly available endoscopy datasets show that our approach outperforms state-of-the-art models in both depth accuracy and robustness. This demonstrates the potential of our method to reduce visual errors in complex endoscopic environments. Our code will be released at https://github.com/yinheng-lin/EndoDDC.