SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the generalization bottleneck of anatomical tissue segmentation in surgical videos under few-shot, multi-organ, and cross-domain settings. We present the first adaptation of SAM 2 to surgical vision tasks, introducing a lightweight fine-tuning strategy that jointly optimizes the image encoder and mask decoder using only 50–400 samples per class. Segmentation is guided by point prompts (1–10 points) and evaluated via Weighted Mean Dice Coefficient (WMDC). Extensive multicenter validation across five datasets demonstrates a 17.9% relative improvement in WMDC over baselines (0.92 on validation sets); on test sets comprising 30 organ classes, our method outperforms prior state-of-the-art (SOTA) in 24 classes and achieves a 77.8% SOTA generalization rate for unseen organ categories. Our core contribution is the first efficient, video-aware SAM 2 fine-tuning paradigm for surgery—significantly enhancing segmentation accuracy and cross-domain robustness under zero-shot and few-shot conditions.

Technology Category

Application Category

📝 Abstract
Background: We evaluate SAM 2 for surgical scene understanding by examining its semantic segmentation capabilities for organs/tissues both in zero-shot scenarios and after fine-tuning. Methods: We utilized five public datasets to evaluate and fine-tune SAM 2 for segmenting anatomical tissues in surgical videos/images. Fine-tuning was applied to the image encoder and mask decoder. We limited training subsets from 50 to 400 samples per class to better model real-world constraints with data acquisition. The impact of dataset size on fine-tuning performance was evaluated with weighted mean Dice coefficient (WMDC), and the results were also compared against previously reported state-of-the-art (SOTA) results. Results: SurgiSAM 2, a fine-tuned SAM 2 model, demonstrated significant improvements in segmentation performance, achieving a 17.9% relative WMDC gain compared to the baseline SAM 2. Increasing prompt points from 1 to 10 and training data scale from 50/class to 400/class enhanced performance; the best WMDC of 0.92 on the validation subset was achieved with 10 prompt points and 400 samples per class. On the test subset, this model outperformed prior SOTA methods in 24/30 (80%) of the classes with a WMDC of 0.91 using 10-point prompts. Notably, SurgiSAM 2 generalized effectively to unseen organ classes, achieving SOTA on 7/9 (77.8%) of them. Conclusion: SAM 2 achieves remarkable zero-shot and fine-tuned performance for surgical scene segmentation, surpassing prior SOTA models across several organ classes of diverse datasets. This suggests immense potential for enabling automated/semi-automated annotation pipelines, thereby decreasing the burden of annotations facilitating several surgical applications.
Problem

Research questions and friction points this paper is trying to address.

Fine-tuning SAM 2 for surgical anatomy segmentation in videos/images.
Evaluating dataset size impact on fine-tuning performance using WMDC.
Achieving SOTA in segmentation for unseen organ classes.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned SAM 2 for surgical anatomy segmentation
Used limited training subsets for real-world constraints
Achieved SOTA performance with 10-point prompts
🔎 Similar Papers
No similar papers found.
D
D. N. Kamtam
Division of Thoracic Surgery, Department of Cardiothoracic Surgery, Stanford University School of Medicine, Stanford, California, USA; Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA
J
J. Shrager
Division of Thoracic Surgery, Department of Cardiothoracic Surgery, Stanford University School of Medicine, Stanford, California, USA; Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA
S
S. D. Malla
Division of Thoracic Surgery, Department of Cardiothoracic Surgery, Stanford University School of Medicine, Stanford, California, USA
X
Xiaohan Wang
Department of Computer Science, Stanford University, Stanford, California, USA
Nicole Lin
Nicole Lin
Stanford University
cardiothoracic surgery
J
Juan J. Cardona
Department of Neurosurgery, Stanford University School of Medicine, Stanford, California, USA
S
S. Yeung-Levy
Department of Biomedical Data Science, Stanford University, Stanford, California, USA; Department of Computer Science, Stanford University, Stanford, California, USA
C
Clarence Hu
Hotpot.ai, Palo Alto, California, USA