MedDiff-FT: Data-Efficient Diffusion Model Fine-tuning with Structural Guidance for Controllable Medical Image Synthesis

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image segmentation is hindered by the scarcity of high-quality annotated data, while existing diffusion models—reliant on large-scale datasets and suffering from insufficient anatomical consistency—fail to effectively alleviate this bottleneck. To address this, we propose a structure-guided, data-efficient diffusion model fine-tuning framework. Our method introduces a dynamic adaptive guidance mask and a lightweight stochastic mask generator, integrating feature-space quality assessment, mask erosion optimization, and hierarchical stochasticity injection to jointly enhance fidelity and diversity of synthesized image-mask pairs while preserving anatomical plausibility. Evaluated on five medical segmentation benchmarks, our approach achieves an average 1.0% Dice score improvement for state-of-the-art segmentation models when fine-tuned with only a small number of real samples. It delivers strong data augmentation performance with high computational efficiency.

Technology Category

Application Category

📝 Abstract
Recent advancements in deep learning for medical image segmentation are often limited by the scarcity of high-quality training data.While diffusion models provide a potential solution by generating synthetic images, their effectiveness in medical imaging remains constrained due to their reliance on large-scale medical datasets and the need for higher image quality. To address these challenges, we present MedDiff-FT, a controllable medical image generation method that fine-tunes a diffusion foundation model to produce medical images with structural dependency and domain specificity in a data-efficient manner. During inference, a dynamic adaptive guiding mask enforces spatial constraints to ensure anatomically coherent synthesis, while a lightweight stochastic mask generator enhances diversity through hierarchical randomness injection. Additionally, an automated quality assessment protocol filters suboptimal outputs using feature-space metrics, followed by mask corrosion to refine fidelity. Evaluated on five medical segmentation datasets,MedDiff-FT's synthetic image-mask pairs improve SOTA method's segmentation performance by an average of 1% in Dice score. The framework effectively balances generation quality, diversity, and computational efficiency, offering a practical solution for medical data augmentation. The code is available at https://github.com/JianhaoXie1/MedDiff-FT.
Problem

Research questions and friction points this paper is trying to address.

Limited high-quality medical training data availability
Diffusion models need large datasets for medical imaging
Ensuring anatomical coherence in synthetic medical images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes diffusion model with structural guidance
Uses dynamic adaptive mask for anatomical coherence
Automated quality assessment refines output fidelity
J
Jianhao Xie
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Peking University Shenzhen Graduate School
Z
Ziang Zhang
Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Peking University Shenzhen Graduate School
Z
Zhenyu Weng
South China University of Technology, China
Yuesheng Zhu
Yuesheng Zhu
Peking University
Intelligent security
Guibo Luo
Guibo Luo
Peking University
medical imagingprivacy computing