Prompt-Tuning SAM: From Generalist to Specialist with only 2048 Parameters and 16 Training Images

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

SAM exhibits significant performance degradation on microscopic and medical images and relies heavily on manual prompting, hindering its deployment in automated biomedical applications. To address this, we propose Prompt-Tuned SAM (PTSAM), the first lightweight adaptation framework that optimizes only learnable prompt embeddings within the mask decoder—without modifying the frozen backbone network—to achieve domain-specific segmentation. We further introduce joint optimization of image encoder prompts, yielding up to 18% accuracy improvement. PTSAM requires merely 16 annotated images and only 2,048 trainable parameters—approximately 2,000× fewer than full fine-tuning. Evaluated across multiple microscopic and medical imaging benchmarks, PTSAM matches or surpasses state-of-the-art methods, effectively bridging the cross-domain performance gap.

Technology Category

Application Category

📝 Abstract

The Segment Anything Model (SAM) is widely used for segmenting a diverse range of objects in natural images from simple user prompts like points or bounding boxes. However, SAM's performance decreases substantially when applied to non-natural domains like microscopic imaging. Furthermore, due to SAM's interactive design, it requires a precise prompt for each image and object, which is unfeasible in many automated biomedical applications. Previous solutions adapt SAM by training millions of parameters via fine-tuning large parts of the model or of adapter layers. In contrast, we show that as little as 2,048 additional parameters are sufficient for turning SAM into a use-case specialist for a certain downstream task. Our novel PTSAM (prompt-tuned SAM) method uses prompt-tuning, a parameter-efficient fine-tuning technique, to adapt SAM for a specific task. We validate the performance of our approach on multiple microscopic and one medical dataset. Our results show that prompt-tuning only SAM's mask decoder already leads to a performance on-par with state-of-the-art techniques while requiring roughly 2,000x less trainable parameters. For addressing domain gaps, we find that additionally prompt-tuning SAM's image encoder is beneficial, further improving segmentation accuracy by up to 18% over state-of-the-art results. Since PTSAM can be reliably trained with as little as 16 annotated images, we find it particularly helpful for applications with limited training data and domain shifts.

Problem

Research questions and friction points this paper is trying to address.

Adapting SAM for non-natural domains like microscopy

Reducing trainable parameters for efficient task specialization

Enhancing segmentation accuracy with minimal training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses prompt-tuning for parameter-efficient adaptation

Requires only 2048 parameters and 16 images

Enhances SAM's performance in non-natural domains

🔎 Similar Papers

On Efficient Variants of Segment Anything Model: A Survey