🤖 AI Summary
Surgical instrument segmentation in endoscopic video faces challenges including frequent occlusions, rapid motion, and mirror artifacts. Existing SAM3 frameworks suffer from undiscriminative memory updates, fixed memory capacity, and weak identity recovery after occlusion. This paper proposes ReMeDI-SAM3—a training-free, memory-augmented framework—introducing three novel components: (1) occlusion-aware memory filtering, (2) segmented dynamic memory interpolation for adaptive capacity expansion, and (3) a collaborative re-identification mechanism combining feature-driven similarity matching with temporal majority voting. Built upon SAM3’s spatiotemporal modeling, ReMeDI-SAM3 integrates correlation-aware memory management and similarity-based retrieval. Evaluated in zero-shot settings on EndoVis17 and EndoVis18, it achieves +7.0% and +16.2% improvements in mean class IoU (mcIoU), respectively, surpassing state-of-the-art methods requiring full model training. The approach significantly enhances temporal consistency and occlusion robustness without parameter optimization.
📝 Abstract
Accurate surgical instrument segmentation in endoscopic videos is crucial for computer-assisted interventions, yet remains challenging due to frequent occlusions, rapid motion, specular artefacts, and long-term instrument re-entry. While SAM3 provides a powerful spatio-temporal framework for video object segmentation, its performance in surgical scenes is limited by indiscriminate memory updates, fixed memory capacity, and weak identity recovery after occlusions. We propose ReMeDI-SAM3, a training-free memory-enhanced extension of SAM3, that addresses these limitations through three components: (i) relevance-aware memory filtering with a dedicated occlusion-aware memory for storing pre-occlusion frames, (ii) a piecewise interpolation scheme that expands the effective memory capacity, and (iii) a feature-based re-identification module with temporal voting for reliable post-occlusion identity disambiguation. Together, these components mitigate error accumulation and enable reliable recovery after occlusions. Evaluations on EndoVis17 and EndoVis18 under a zero-shot setting show absolute mcIoU improvements of around 7% and 16%, respectively, over vanilla SAM3, outperforming even prior training-based approaches. Project page: https://valaybundele.github.io/remedi-sam3/.