🤖 AI Summary
Medical foundation models like SAM suffer from poor cross-domain generalization in medical image segmentation and prohibitively high computational costs under full-parameter fine-tuning. To address these challenges, this paper proposes an embedded lightweight decoder adapter—modularly integrated into SAM’s mask decoder—with fewer than 1% of the original parameters trained. Our method synergistically combines parameter-efficient fine-tuning (PEFT) with multi-source medical imaging domain alignment, supporting both fully supervised training and test-time adaptation. Evaluated on four benchmark medical datasets, the proposed approach achieves state-of-the-art segmentation accuracy, significantly outperforming existing domain adaptation methods. Moreover, it substantially reduces GPU memory consumption and training overhead while maintaining competitive performance against full fine-tuning.
📝 Abstract
This paper addresses the domain adaptation challenge for semantic segmentation in medical imaging. Despite the impressive performance of recent foundational segmentation models like SAM on natural images, they struggle with medical domain images. Beyond this, recent approaches that perform end-to-end fine-tuning of models are simply not computationally tractable. To address this, we propose a novel SAM adapter approach that minimizes the number of trainable parameters while achieving comparable performances to full fine-tuning. The proposed SAM adapter is strategically placed in the mask decoder, offering excellent and broad generalization capabilities and improved segmentation across both fully supervised and test-time domain adaptation tasks. Extensive validation on four datasets showcases the adapter's efficacy, outperforming existing methods while training less than 1% of SAM's total parameters.