LEAF: Latent Diffusion with Efficient Encoder Distillation for Aligned Features in Medical Image Segmentation

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak feature extraction capability and suboptimal training paradigms of diffusion models in medical image segmentation, this paper proposes LEAF: a latent diffusion-based framework that abandons conventional noise prediction and instead directly regresses segmentation masks to reduce output variance. LEAF introduces, for the first time without modifying network architecture, feature distillation to align intermediate representations between convolutional hidden layers and Transformer-based visual encoders. It further adopts an efficient fine-tuning strategy with a frozen backbone. Evaluated on diverse multi-disease and multi-modal medical segmentation benchmarks, LEAF significantly outperforms baseline diffusion models, demonstrating both effectiveness and strong generalization. Its core innovations lie in (1) a task-adapted direct prediction paradigm—bypassing iterative denoising—and (2) a latent-state alignment mechanism that bridges architectural heterogeneity between convolutional and attention-based encoders.

Technology Category

Application Category

📝 Abstract
Leveraging the powerful capabilities of diffusion models has yielded quite effective results in medical image segmentation tasks. However, existing methods typically transfer the original training process directly without specific adjustments for segmentation tasks. Furthermore, the commonly used pre-trained diffusion models still have deficiencies in feature extraction. Based on these considerations, we propose LEAF, a medical image segmentation model grounded in latent diffusion models. During the fine-tuning process, we replace the original noise prediction pattern with a direct prediction of the segmentation map, thereby reducing the variance of segmentation results. We also employ a feature distillation method to align the hidden states of the convolutional layers with the features from a transformer-based vision encoder. Experimental results demonstrate that our method enhances the performance of the original diffusion model across multiple segmentation datasets for different disease types. Notably, our approach does not alter the model architecture, nor does it increase the number of parameters or computation during the inference phase, making it highly efficient.
Problem

Research questions and friction points this paper is trying to address.

Improves medical image segmentation using latent diffusion models
Aligns features via efficient encoder distillation for better accuracy
Enhances performance without increasing model parameters or computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct segmentation map prediction replaces noise prediction
Feature distillation aligns convolutional and transformer features
No architecture change or added inference parameters
🔎 Similar Papers
No similar papers found.
Q
Qilin Huang
School of Computer Science and Engineering, Sun Yat-sen University, China
Tianyu Lin
Tianyu Lin
Johns Hopkins University
Medical Image AnalysisComputer Vision
Z
Zhiguang Chen
School of Computer Science and Engineering, Sun Yat-sen University, China
F
Fudan Zheng
School of Computer Science and Engineering, Sun Yat-sen University, China