ProGiDiff: Prompt-Guided Diffusion-Based Medical Image Segmentation

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes ProGiDiff, a novel framework that integrates pretrained diffusion models with natural language prompts to address key limitations of existing deterministic medical image segmentation methods, which lack support for natural language interaction, multi-proposal generation, and cross-modal transfer. By incorporating a ControlNet-style customized image encoder for conditional guidance, ProGiDiff generates multi-class organ segmentation masks while enabling expert-in-the-loop interactive multi-hypothesis outputs. The framework further introduces low-rank fine-tuning and few-shot adaptation strategies to facilitate efficient cross-modal transfer from CT to MRI. Experimental results demonstrate that ProGiDiff outperforms current methods on CT segmentation tasks and achieves effective generalization to the MRI domain with only a few annotated samples.

Technology Category

Application Category

📝 Abstract
Widely adopted medical image segmentation methods, although efficient, are primarily deterministic and remain poorly amenable to natural language prompts. Thus, they lack the capability to estimate multiple proposals, human interaction, and cross-modality adaptation. Recently, text-to-image diffusion models have shown potential to bridge the gap. However, training them from scratch requires a large dataset-a limitation for medical image segmentation. Furthermore, they are often limited to binary segmentation and cannot be conditioned on a natural language prompt. To this end, we propose a novel framework called ProGiDiff that leverages existing image generation models for medical image segmentation purposes. Specifically, we propose a ControlNet-style conditioning mechanism with a custom encoder, suitable for image conditioning, to steer a pre-trained diffusion model to output segmentation masks. It naturally extends to a multi-class setting simply by prompting the target organ. Our experiment on organ segmentation from CT images demonstrates strong performance compared to previous methods and could greatly benefit from an expert-in-the-loop setting to leverage multiple proposals. Importantly, we demonstrate that the learned conditioning mechanism can be easily transferred through low-rank, few-shot adaptation to segment MR images.
Problem

Research questions and friction points this paper is trying to address.

medical image segmentation
natural language prompt
diffusion models
cross-modality adaptation
multi-class segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt-guided diffusion
Medical image segmentation
ControlNet-style conditioning
Cross-modality adaptation
Few-shot adaptation
🔎 Similar Papers
No similar papers found.
Yuan Lin
Yuan Lin
Ocean College, Zhejiang University
RheologyPolymer physcisMulti-phase flow
M
Murong Xu
University of Zurich, Switzerland
M
Marc Hölle
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
C
Chinmay Prabhakar
University of Zurich, Switzerland
A
Andreas Maier
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
Vasileios Belagiannis
Vasileios Belagiannis
Professor, Friedrich-Alexander-Universität Erlangen-Nürnberg
Machine LearningComputer VisionRobotics
B
Bjoern H Menze
University of Zurich, Switzerland
Suprosanna Shit
Suprosanna Shit
University of Zurich | ETH AI Center
Machine LearningMedical ImagingComputer VisionSignal Processing