MedSAM3: Delving into Segment Anything with Medical Concepts

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image segmentation suffers from poor generalizability and heavy reliance on large-scale manual annotations. To address this, we propose a novel paradigm—Medical Promptable Concept Segmentation—and introduce MedSAM-3 Agent: a SAM3-based framework integrated with a multimodal large language model (MLLM) to enable text-prompted, open-vocabulary anatomical structure segmentation. It supports diverse input modalities—including X-ray, MRI, CT, ultrasound, and medical video—and enhances robustness via a closed-loop workflow comprising semantic understanding, prompt generation, segmentation inference, and iterative refinement. Leveraging only a few concept-label pairs for fine-tuning, MedSAM-3 achieves state-of-the-art performance across cross-modal, cross-organ, and cross-device scenarios, significantly outperforming both domain-specific and general-purpose segmentation models. It demonstrates strong clinical generalizability and substantial annotation efficiency gains.

Technology Category

Application Category

📝 Abstract
Medical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application. Here, we propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmentation. By fine-tuning the Segment Anything Model (SAM) 3 architecture on medical images paired with semantic conceptual labels, our MedSAM-3 enables medical Promptable Concept Segmentation (PCS), allowing precise targeting of anatomical structures via open-vocabulary text descriptions rather than solely geometric prompts. We further introduce the MedSAM-3 Agent, a framework that integrates Multimodal Large Language Models (MLLMs) to perform complex reasoning and iterative refinement in an agent-in-the-loop workflow. Comprehensive experiments across diverse medical imaging modalities, including X-ray, MRI, Ultrasound, CT, and video, demonstrate that our approach significantly outperforms existing specialist and foundation models. We will release our code and model at https://github.com/Joey-S-Liu/MedSAM3.
Problem

Research questions and friction points this paper is trying to address.

Addresses limited generalizability in medical image segmentation methods
Reduces reliance on manual annotation through text-promptable segmentation
Enables precise anatomical targeting via open-vocabulary text descriptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned SAM3 for medical concept segmentation
Integrated MLLMs for reasoning and refinement
Enabled open-vocabulary text prompts for anatomy targeting
🔎 Similar Papers
No similar papers found.
A
Anglin Liu
The Hong Kong University of Science and Technology (Guangzhou)
R
Rundong Xue
Xi’an Jiaotong University
X
Xu R. Cao
University of Illinois Urbana-Champaign
Y
Yifan Shen
University of Illinois Urbana-Champaign
Y
Yi Lu
The Hong Kong University of Science and Technology (Guangzhou)
X
Xiang Li
University of Illinois Urbana-Champaign
Q
Qianqian Chen
Southeast University
Jintai Chen
Jintai Chen
Assistant Professor@HKUST(GZ)
AI for HealthcareMultimodal LearningDeep Tabular Learning