MedSAM2: Segment Anything in 3D Medical Images and Videos

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

The absence of general-purpose foundation models and systematic user studies for 3D medical image and video segmentation hinders broad clinical and research adoption. Method: We introduce the first promptable foundation model for 3D medical image/video segmentation. We establish a novel large-scale promptable segmentation paradigm tailored to 3D medical data; design a human-in-the-loop annotation pipeline, enabling the largest multi-center, multi-modal medical user study to date; fine-tune SAM2 with integrated 3D convolutions and spatiotemporal attention; and train on 455K 3D image–mask pairs and 76K video frames. Contribution/Results: The model achieves generalizable, zero-shot segmentation across organs, pathologies, and modalities (CT, MRI, echocardiography), significantly outperforming state-of-the-art methods on multiple benchmarks. Annotation effort is reduced by over 85%. Deployed on both local and cloud platforms, it supports scalable, efficient clinical and research applications.

Technology Category

Application Category

📝 Abstract

Medical image and video segmentation is a critical task for precision medicine, which has witnessed considerable progress in developing task or modality-specific and generalist models for 2D images. However, there have been limited studies on building general-purpose models for 3D images and videos with comprehensive user studies. Here, we present MedSAM2, a promptable segmentation foundation model for 3D image and video segmentation. The model is developed by fine-tuning the Segment Anything Model 2 on a large medical dataset with over 455,000 3D image-mask pairs and 76,000 frames, outperforming previous models across a wide range of organs, lesions, and imaging modalities. Furthermore, we implement a human-in-the-loop pipeline to facilitate the creation of large-scale datasets resulting in, to the best of our knowledge, the most extensive user study to date, involving the annotation of 5,000 CT lesions, 3,984 liver MRI lesions, and 251,550 echocardiogram video frames, demonstrating that MedSAM2 can reduce manual costs by more than 85%. MedSAM2 is also integrated into widely used platforms with user-friendly interfaces for local and cloud deployment, making it a practical tool for supporting efficient, scalable, and high-quality segmentation in both research and healthcare environments.

Problem

Research questions and friction points this paper is trying to address.

Develop general-purpose 3D medical image segmentation model

Reduce manual annotation costs by over 85%

Enable scalable high-quality segmentation in healthcare

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes Segment Anything Model 2

Uses large 3D medical dataset

Human-in-the-loop pipeline integration

🔎 Similar Papers

No similar papers found.