🤖 AI Summary
Low-quality optical flow in endoscopic videos yields unreliable motion cues, hindering unsupervised surgical instrument segmentation. To address this, we propose a collaborative optimization framework: (1) extracting optical flow boundaries to enhance structural priors; (2) designing an adaptive frame quality assessment module to select reliable temporal segments; and (3) introducing variable-frame-rate contrastive learning for fine-tuning. This work is the first to systematically reformulate the utilization paradigm of low-quality optical flow in unsupervised video object segmentation (VOS), breaking away from traditional heavy reliance on optical flow accuracy. On the EndoVis2017 VOS and Challenge benchmarks, our method achieves mIoU scores of 0.75 and 0.72, respectively—substantially outperforming existing unsupervised approaches. Moreover, it significantly reduces clinical annotation burden and improves adaptability to novel surgical scenarios.
📝 Abstract
Video-based surgical instrument segmentation plays an important role in robot-assisted surgeries. Unlike supervised settings, unsupervised segmentation relies heavily on motion cues, which are challenging to discern due to the typically lower quality of optical flow in surgical footage compared to natural scenes. This presents a considerable burden for the advancement of unsupervised segmentation techniques. In our work, we address the challenge of enhancing model performance despite the inherent limitations of low-quality optical flow. Our methodology employs a three-pronged approach: extracting boundaries directly from the optical flow, selectively discarding frames with inferior flow quality, and employing a fine-tuning process with variable frame rates. We thoroughly evaluate our strategy on the EndoVis2017 VOS dataset and Endovis2017 Challenge dataset, where our model demonstrates promising results, achieving a mean Intersection-over-Union (mIoU) of 0.75 and 0.72, respectively. Our findings suggest that our approach can greatly decrease the need for manual annotations in clinical environments and may facilitate the annotation process for new datasets. The code is available at https://github.com/wpr1018001/Rethinking-Low-quality-Optical-Flow.git