🤖 AI Summary
This work addresses the challenges of sparse and incomplete Structure-from-Motion (SfM) reconstructions from endoscopic videos by proposing SuperPoint-E, a novel method that introduces a first-of-its-kind Tracking Adaptation supervision mechanism to refine local feature detection and description. By enhancing both feature density and discriminability, SuperPoint-E achieves high-precision matches with minimal reliance on guided matching. Experimental results demonstrate that, compared to the original SuperPoint and the COLMAP gold standard, SuperPoint-E enables significantly denser and more complete 3D reconstructions on long video sequences, while substantially improving both feature detection accuracy and matching success rates.
📝 Abstract
In this work, we focus on boosting the feature extraction to improve the performance of Structure-from-Motion (SfM) in endoscopy videos. We present SuperPoint-E, a new local feature extraction method that, using our proposed Tracking Adaptation supervision strategy, significantly improves the quality of feature detection and description in endoscopy. Extensive experimentation on real endoscopy recordings studies our approach's most suitable configuration and evaluates SuperPoint-E feature quality. The comparison with other baselines also shows that our 3D reconstructions are denser and cover more and longer video segments because our detector fires more densely and our features are more likely to survive (i.e. higher detection precision). In addition, our descriptor is more discriminative, making the guided matching step almost redundant. The presented approach brings significant improvements in the 3D reconstructions obtained, via SfM on endoscopy videos, compared to the original SuperPoint and the gold standard SfM COLMAP pipeline.