🤖 AI Summary
In medical image segmentation, the Segment Anything Model (SAM) relies on manual prompts (e.g., points or bounding boxes), yet such human-provided priors are often unavailable in clinical practice, severely hindering real-world deployment. To address this, we propose the first external-prompt-free hierarchical self-prompting framework, enabling SAM to autonomously generate learnable, abstract semantic prompts—beyond mere positional cues—for fully automatic segmentation. Methodologically, we design a hierarchical feature-guided prompt generation module, jointly optimizing abstract prompt embedding learning and multimodal adaptive fine-tuning within the SAM architecture to achieve cross-modal generalization. Evaluated on polyp and skin lesion segmentation tasks, our approach significantly outperforms state-of-the-art methods, achieving up to a 14.04% improvement on challenging benchmarks, while demonstrating strong generalization to unseen datasets.
📝 Abstract
Although the Segment Anything Model (SAM) is highly effective in natural image segmentation, it requires dependencies on prompts, which limits its applicability to medical imaging where manual prompts are often unavailable. Existing efforts to fine-tune SAM for medical segmentation typically struggle to remove this dependency. We propose Hierarchical Self-Prompting SAM (HSP-SAM), a novel self-prompting framework that enables SAM to achieve strong performance in prompt-free medical image segmentation. Unlike previous self-prompting methods that remain limited to positional prompts similar to vanilla SAM, we are the first to introduce learning abstract prompts during the self-prompting process. This simple and intuitive self-prompting framework achieves superior performance on classic segmentation tasks such as polyp and skin lesion segmentation, while maintaining robustness across diverse medical imaging modalities. Furthermore, it exhibits strong generalization to unseen datasets, achieving improvements of up to 14.04% over previous state-of-the-art methods on some challenging benchmarks. These results suggest that abstract prompts encapsulate richer and higher-dimensional semantic information compared to positional prompts, thereby enhancing the model's robustness and generalization performance. All models and codes will be released upon acceptance.