🤖 AI Summary
Medical image segmentation is hindered by the scarcity of high-quality annotated data, while existing diffusion models—reliant on large-scale datasets and suffering from insufficient anatomical consistency—fail to effectively alleviate this bottleneck. To address this, we propose a structure-guided, data-efficient diffusion model fine-tuning framework. Our method introduces a dynamic adaptive guidance mask and a lightweight stochastic mask generator, integrating feature-space quality assessment, mask erosion optimization, and hierarchical stochasticity injection to jointly enhance fidelity and diversity of synthesized image-mask pairs while preserving anatomical plausibility. Evaluated on five medical segmentation benchmarks, our approach achieves an average 1.0% Dice score improvement for state-of-the-art segmentation models when fine-tuned with only a small number of real samples. It delivers strong data augmentation performance with high computational efficiency.
📝 Abstract
Recent advancements in deep learning for medical image segmentation are often limited by the scarcity of high-quality training data.While diffusion models provide a potential solution by generating synthetic images, their effectiveness in medical imaging remains constrained due to their reliance on large-scale medical datasets and the need for higher image quality. To address these challenges, we present MedDiff-FT, a controllable medical image generation method that fine-tunes a diffusion foundation model to produce medical images with structural dependency and domain specificity in a data-efficient manner. During inference, a dynamic adaptive guiding mask enforces spatial constraints to ensure anatomically coherent synthesis, while a lightweight stochastic mask generator enhances diversity through hierarchical randomness injection. Additionally, an automated quality assessment protocol filters suboptimal outputs using feature-space metrics, followed by mask corrosion to refine fidelity. Evaluated on five medical segmentation datasets,MedDiff-FT's synthetic image-mask pairs improve SOTA method's segmentation performance by an average of 1% in Dice score. The framework effectively balances generation quality, diversity, and computational efficiency, offering a practical solution for medical data augmentation. The code is available at https://github.com/JianhaoXie1/MedDiff-FT.