🤖 AI Summary
To address the scarcity of high-quality annotations and low label efficiency in multi-organ CT segmentation, this paper proposes a knowledge transfer framework based on self-supervised pre-trained diffusion models (DDPMs). It is the first work to introduce unsupervised pre-trained diffusion models into few-shot medical image segmentation. We design a dual-path fine-tuning strategy—comprising a linear classification probe and a segmentation head—to preserve feature representation robustness under extremely limited annotation budgets. Experiments demonstrate that our method achieves a Dice Similarity Coefficient (DSC) of 51.81% using only four annotated slices; with 1% and 10% labeled data, it attains DSC scores of 71.56% and 78.51%, respectively—substantially outperforming existing few-shot approaches. This work establishes a novel paradigm for few-shot medical image segmentation by leveraging diffusion-based self-supervised pretraining and efficient adaptation.
📝 Abstract
Accurate segmentation of multiple organs in Computed Tomography (CT) images plays a vital role in computer-aided diagnosis systems. While various supervised learning approaches have been proposed recently, these methods heavily depend on a large amount of high-quality labeled data, which are expensive to obtain in practice. To address this challenge, we propose a label-efficient framework using knowledge transfer from a pre-trained diffusion model for CT multi-organ segmentation. Specifically, we first pre-train a denoising diffusion model on 207,029 unlabeled 2D CT slices to capture anatomical patterns. Then, the model backbone is transferred to the downstream multi-organ segmentation task, followed by fine-tuning with few labeled data. In fine-tuning, two fine-tuning strategies, linear classification and fine-tuning decoder, are employed to enhance segmentation performance while preserving learned representations. Quantitative results show that the pre-trained diffusion model is capable of generating diverse and realistic 256x256 CT images (Fr'echet inception distance (FID): 11.32, spatial Fr'echet inception distance (sFID): 46.93, F1-score: 73.1%). Compared to state-of-the-art methods for multi-organ segmentation, our method achieves competitive performance on the FLARE 2022 dataset, particularly in limited labeled data scenarios. After fine-tuning with 1% and 10% labeled data, our method achieves dice similarity coefficients (DSCs) of 71.56% and 78.51%, respectively. Remarkably, the method achieves a DSC score of 51.81% using only four labeled CT slices. These results demonstrate the efficacy of our approach in overcoming the limitations of supervised learning approaches that is highly dependent on large-scale labeled data.