SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the generalization bottleneck of anatomical tissue segmentation in surgical videos under few-shot, multi-organ, and cross-domain settings. We present the first adaptation of SAM 2 to surgical vision tasks, introducing a lightweight fine-tuning strategy that jointly optimizes the image encoder and mask decoder using only 50–400 samples per class. Segmentation is guided by point prompts (1–10 points) and evaluated via Weighted Mean Dice Coefficient (WMDC). Extensive multicenter validation across five datasets demonstrates a 17.9% relative improvement in WMDC over baselines (0.92 on validation sets); on test sets comprising 30 organ classes, our method outperforms prior state-of-the-art (SOTA) in 24 classes and achieves a 77.8% SOTA generalization rate for unseen organ categories. Our core contribution is the first efficient, video-aware SAM 2 fine-tuning paradigm for surgery—significantly enhancing segmentation accuracy and cross-domain robustness under zero-shot and few-shot conditions.

Technology Category

Application Category

📝 Abstract

Background: We evaluate SAM 2 for surgical scene understanding by examining its semantic segmentation capabilities for organs/tissues both in zero-shot scenarios and after fine-tuning. Methods: We utilized five public datasets to evaluate and fine-tune SAM 2 for segmenting anatomical tissues in surgical videos/images. Fine-tuning was applied to the image encoder and mask decoder. We limited training subsets from 50 to 400 samples per class to better model real-world constraints with data acquisition. The impact of dataset size on fine-tuning performance was evaluated with weighted mean Dice coefficient (WMDC), and the results were also compared against previously reported state-of-the-art (SOTA) results. Results: SurgiSAM 2, a fine-tuned SAM 2 model, demonstrated significant improvements in segmentation performance, achieving a 17.9% relative WMDC gain compared to the baseline SAM 2. Increasing prompt points from 1 to 10 and training data scale from 50/class to 400/class enhanced performance; the best WMDC of 0.92 on the validation subset was achieved with 10 prompt points and 400 samples per class. On the test subset, this model outperformed prior SOTA methods in 24/30 (80%) of the classes with a WMDC of 0.91 using 10-point prompts. Notably, SurgiSAM 2 generalized effectively to unseen organ classes, achieving SOTA on 7/9 (77.8%) of them. Conclusion: SAM 2 achieves remarkable zero-shot and fine-tuned performance for surgical scene segmentation, surpassing prior SOTA models across several organ classes of diverse datasets. This suggests immense potential for enabling automated/semi-automated annotation pipelines, thereby decreasing the burden of annotations facilitating several surgical applications.

Problem

Research questions and friction points this paper is trying to address.

Fine-tuning SAM 2 for surgical anatomy segmentation in videos/images.

Evaluating dataset size impact on fine-tuning performance using WMDC.

Achieving SOTA in segmentation for unseen organ classes.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned SAM 2 for surgical anatomy segmentation

Used limited training subsets for real-world constraints

Achieved SOTA performance with 10-point prompts

🔎 Similar Papers

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning