Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the heavy reliance of foundation models (e.g., SAM) on large-scale annotated data and expert-crafted prompts in medical image segmentation, this paper proposes a semi-supervised framework tailored for low-labeling regimes. Methodologically, it integrates contrastive vision-language pretraining, vision-language question-answering–guided prompt generation, and lightweight adapter-based fine-tuning. Its key contributions are: (i) the first unsupervised semantic prompt generation mechanism, eliminating manual prompt engineering; and (ii) a virtual annotator–driven direct preference optimization (DPO) strategy that bypasses explicit reward modeling and human-in-the-loop feedback. Evaluated across multimodal medical imaging modalities—including X-ray, ultrasound, and abdominal CT—the framework achieves state-of-the-art performance in lung, breast tumor, and organ segmentation. Notably, it demonstrates substantial gains in robustness and generalization using fewer than 1% labeled samples.

Technology Category

Application Category

📝 Abstract

Foundational models such as the Segment Anything Model (SAM) are gaining traction in medical imaging segmentation, supporting multiple downstream tasks. However, such models are supervised in nature, still relying on large annotated datasets or prompts supplied by experts. Conventional techniques such as active learning to alleviate such limitations are limited in scope and still necessitate continuous human involvement and complex domain knowledge for label refinement or establishing reward ground truth. To address these challenges, we propose an enhanced Segment Anything Model (SAM) framework that utilizes annotation-efficient prompts generated in a fully unsupervised fashion, while still capturing essential semantic, location, and shape information through contrastive language-image pretraining and visual question answering. We adopt the direct preference optimization technique to design an optimal policy that enables the model to generate high-fidelity segmentations with simple ratings or rankings provided by a virtual annotator simulating the human annotation process. State-of-the-art performance of our framework in tasks such as lung segmentation, breast tumor segmentation, and organ segmentation across various modalities, including X-ray, ultrasound, and abdominal CT, justifies its effectiveness in low-annotation data scenarios.

Problem

Research questions and friction points this paper is trying to address.

Reduces reliance on large annotated datasets for medical image segmentation.

Introduces unsupervised prompting for efficient semantic and shape information capture.

Optimizes segmentation accuracy with minimal human annotation using virtual annotators.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised prompt generation for SAM

Contrastive language-image pretraining integration

Direct preference optimization for segmentation

🔎 Similar Papers

How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model