🤖 AI Summary
Existing medical diffusion models are typically constrained to single anatomical regions, tasks, or datasets, limiting their generalizability and clinical utility. To address this, we propose MedDiff-FM—the first 3D medical diffusion foundation model designed for multi-anatomical (head-to-abdomen) and multi-task learning. Our method introduces: (1) a novel general-purpose medical diffusion foundation architecture; (2) anatomy-aware positional embeddings and structure-guided modeling; and (3) joint image- and patch-level representation learning, enabling unified support—via a single pretrained model plus lightweight ControlNet fine-tuning—for six key tasks: denoising, anomaly detection, synthesis, super-resolution, lesion generation, and inpainting. Pretrained on multi-center CT data across anatomical domains, MedDiff-FM achieves significant improvements on multiple public benchmarks: +2.1 dB PSNR, +5.3% AUROC, and enhanced synthesis fidelity. It further demonstrates strong few-shot adaptability to downstream tasks.
📝 Abstract
Diffusion models have achieved significant success in both natural image and medical image domains, encompassing a wide range of applications. Previous investigations in medical images have often been constrained to specific anatomical regions, particular applications, and limited datasets, resulting in isolated diffusion models. This paper introduces a diffusion-based foundation model to address a diverse range of medical image tasks, namely MedDiff-FM. MedDiff-FM leverages 3D CT images from multiple publicly available datasets, covering anatomical regions from head to abdomen, to pre-train a diffusion foundation model, and explores the capabilities of the diffusion foundation model across a variety of application scenarios. The diffusion foundation model handles multi-level integrated image processing both at the image-level and patch-level, utilizes position embedding to establish multi-level spatial relationships, and leverages region classes and anatomical structures to capture certain anatomical regions. MedDiff-FM manages several downstream tasks seamlessly, including image denoising, anomaly detection, and image synthesis. MedDiff-FM is also capable of performing super-resolution, lesion generation, and lesion inpainting by rapidly fine-tuning the diffusion foundation model using ControlNet with task-specific conditions. The experimental results demonstrate the effectiveness of MedDiff-FM in addressing diverse downstream medical image tasks.