🤖 AI Summary
This work addresses the challenges of dynamic 3D reconstruction in endoscopic videos caused by tissue deformation, monocular imaging, varying illumination, occlusions, and unknown camera trajectories. The authors propose a self-supervised neural rendering approach that learns directly from monocular video without requiring templates or pretrained models. By jointly modeling a canonical radiance field and an SE(3)-based temporally coherent deformation field, the method leverages deformable neural radiance fields with implicit 3D representations to enable novel view synthesis and reconstruction of dynamic tissues. Evaluated across diverse and complex endoscopic scenarios, the proposed method consistently outperforms existing techniques in both novel view synthesis accuracy and 3D reconstruction quality.
📝 Abstract
Endoscopy is essential in medical imaging, used for diagnosis, prognosis and treatment. Developing a robust dynamic 3D reconstruction pipeline for endoscopic videos could enhance visualization, improve diagnostic accuracy, aid in treatment planning, and guide surgery procedures. However, challenges arise due to the deformable nature of the tissues, the use of monocular cameras, illumination changes, occlusions and unknown camera trajectories. Inspired by neural rendering, we introduce NeRFscopy, a self-supervised pipeline for novel view synthesis and 3D reconstruction of deformable endoscopic tissues from a monocular video. NeRFscopy includes a deformable model with a canonical radiance field and a time-dependent deformation field parameterized by SE(3) transformations. In addition, the color images are efficiently exploited by introducing sophisticated terms to learn a 3D implicit model without assuming any template or pre-trained model, solely from data. NeRFscopy achieves accurate results in terms of novel view synthesis, outperforming competing methods across various challenging endoscopy scenes.