MedDiff-FT: Data-Efficient Diffusion Model Fine-tuning with Structural Guidance for Controllable Medical Image Synthesis

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Medical image segmentation is hindered by the scarcity of high-quality annotated data, while existing diffusion models—reliant on large-scale datasets and suffering from insufficient anatomical consistency—fail to effectively alleviate this bottleneck. To address this, we propose a structure-guided, data-efficient diffusion model fine-tuning framework. Our method introduces a dynamic adaptive guidance mask and a lightweight stochastic mask generator, integrating feature-space quality assessment, mask erosion optimization, and hierarchical stochasticity injection to jointly enhance fidelity and diversity of synthesized image-mask pairs while preserving anatomical plausibility. Evaluated on five medical segmentation benchmarks, our approach achieves an average 1.0% Dice score improvement for state-of-the-art segmentation models when fine-tuned with only a small number of real samples. It delivers strong data augmentation performance with high computational efficiency.

Technology Category

Application Category

📝 Abstract

Recent advancements in deep learning for medical image segmentation are often limited by the scarcity of high-quality training data.While diffusion models provide a potential solution by generating synthetic images, their effectiveness in medical imaging remains constrained due to their reliance on large-scale medical datasets and the need for higher image quality. To address these challenges, we present MedDiff-FT, a controllable medical image generation method that fine-tunes a diffusion foundation model to produce medical images with structural dependency and domain specificity in a data-efficient manner. During inference, a dynamic adaptive guiding mask enforces spatial constraints to ensure anatomically coherent synthesis, while a lightweight stochastic mask generator enhances diversity through hierarchical randomness injection. Additionally, an automated quality assessment protocol filters suboptimal outputs using feature-space metrics, followed by mask corrosion to refine fidelity. Evaluated on five medical segmentation datasets,MedDiff-FT's synthetic image-mask pairs improve SOTA method's segmentation performance by an average of 1% in Dice score. The framework effectively balances generation quality, diversity, and computational efficiency, offering a practical solution for medical data augmentation. The code is available at https://github.com/JianhaoXie1/MedDiff-FT.

Problem

Research questions and friction points this paper is trying to address.

Limited high-quality medical training data availability

Diffusion models need large datasets for medical imaging

Ensuring anatomical coherence in synthetic medical images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes diffusion model with structural guidance

Uses dynamic adaptive mask for anatomical coherence

Automated quality assessment refines output fidelity

🔎 Similar Papers

MediSyn: A Generalist Text-Guided Latent Diffusion Model For Diverse Medical Image Synthesis