🤖 AI Summary
To address the bottleneck in few-shot medical image segmentation—namely, heavy reliance on extensive manual annotations and handcrafted prompts—this paper proposes a prototype-guided prompt learning framework enabling end-to-end adaptation using only 10% of 2D slices. Methodologically, we introduce (i) a plug-and-play context modulation module and (ii) a class-guided cross-attention mechanism that automatically learns class prototypes and generates high-quality prompts without human intervention or fine-grained annotations. Additionally, we integrate multi-scale feature modulation with targeted SAM fine-tuning. Evaluated on public multi-organ and private ventricle datasets, our approach significantly outperforms existing prompt-free SAM variants, achieving substantial average Dice score improvements. Results demonstrate both high efficiency—requiring minimal labeled data—and strong generalization across anatomical structures and imaging domains.
📝 Abstract
The Segment Anything Model (SAM) has demonstrated strong and versatile segmentation capabilities, along with intuitive prompt-based interactions. However, customizing SAM for medical image segmentation requires massive amounts of pixel-level annotations and precise point- or box-based prompt designs. To address these challenges, we introduce PGP-SAM, a novel prototype-based few-shot tuning approach that uses limited samples to replace tedious manual prompts. Our key idea is to leverage inter- and intra-class prototypes to capture class-specific knowledge and relationships. We propose two main components: (1) a plug-and-play contextual modulation module that integrates multi-scale information, and (2) a class-guided cross-attention mechanism that fuses prototypes and features for automatic prompt generation. Experiments on a public multi-organ dataset and a private ventricle dataset demonstrate that PGP-SAM achieves superior mean Dice scores compared with existing prompt-free SAM variants, while using only 10% of the 2D slices.