One Patient's Annotation is Another One's Initialization: Towards Zero-Shot Surgical Video Segmentation with Cross-Patient Initialization

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current surgical video segmentation methods rely on manual initialization, hindering real-time clinical deployment. To address this, we propose a cross-patient frame initialization paradigm—introducing, for the first time, annotated frames from *other* patients as zero-shot initialization sources, thereby eliminating dependence on target-patient annotations. Our method builds upon a video object segmentation framework and integrates three key components: cross-patient feature transfer, frame-wise similarity assessment, and robust spatio-temporal alignment—enabling fully automatic, human-free target tracking initiation. Evaluated across multiple surgical video datasets under zero-shot settings, our approach achieves state-of-the-art performance (improving mean J&F score by 2.1%), significantly reduces manual intervention frequency, and demonstrates strong feasibility for clinical integration.

Technology Category

Application Category

📝 Abstract
Video object segmentation is an emerging technology that is well-suited for real-time surgical video segmentation, offering valuable clinical assistance in the operating room by ensuring consistent frame tracking. However, its adoption is limited by the need for manual intervention to select the tracked object, making it impractical in surgical settings. In this work, we tackle this challenge with an innovative solution: using previously annotated frames from other patients as the tracking frames. We find that this unconventional approach can match or even surpass the performance of using patients' own tracking frames, enabling more autonomous and efficient AI-assisted surgical workflows. Furthermore, we analyze the benefits and limitations of this approach, highlighting its potential to enhance segmentation accuracy while reducing the need for manual input. Our findings provide insights into key factors influencing performance, offering a foundation for future research on optimizing cross-patient frame selection for real-time surgical video analysis.
Problem

Research questions and friction points this paper is trying to address.

Enables zero-shot surgical video segmentation using cross-patient initialization.
Reduces manual intervention in AI-assisted surgical workflows.
Improves segmentation accuracy by leveraging annotated frames from other patients.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-patient initialization for zero-shot segmentation
Utilizes annotated frames from other patients
Enhances accuracy and reduces manual intervention
Seyed Amir Mousavi
Seyed Amir Mousavi
PhD. Sturdent, Ghent University
Computer VisionNatural Language ProcessingMachine Learning
Utku Ozbulak
Utku Ozbulak
Research Professor at Ghent University
Trustworthy AIMedical imagingBiomedical imagingSelf-supervised learning
F
Francesca Tozzi
Department of GI Surgery, Ghent University Hospital, Ghent, Belgium; Department of Human Structure and Repair, Ghent University, Ghent, Belgium
N
Nikdokht Rashidian
Department of Human Structure and Repair, Ghent University, Ghent, Belgium; Department of HPB Surgery & Liver Transplantation, Ghent University Hospital, Ghent, Belgium
Wouter Willaert
Wouter Willaert
Hoofddocent anatomie, universiteit gent
peritoneal metastasessurgical education
J
J. Vankerschaver
Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, Republic of Korea; Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
W
W. D. Neve
Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, Republic of Korea; IDLab, ELIS, Ghent University, Ghent, Belgium