LEAF: Latent Diffusion with Efficient Encoder Distillation for Aligned Features in Medical Image Segmentation

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the weak feature extraction capability and suboptimal training paradigms of diffusion models in medical image segmentation, this paper proposes LEAF: a latent diffusion-based framework that abandons conventional noise prediction and instead directly regresses segmentation masks to reduce output variance. LEAF introduces, for the first time without modifying network architecture, feature distillation to align intermediate representations between convolutional hidden layers and Transformer-based visual encoders. It further adopts an efficient fine-tuning strategy with a frozen backbone. Evaluated on diverse multi-disease and multi-modal medical segmentation benchmarks, LEAF significantly outperforms baseline diffusion models, demonstrating both effectiveness and strong generalization. Its core innovations lie in (1) a task-adapted direct prediction paradigm—bypassing iterative denoising—and (2) a latent-state alignment mechanism that bridges architectural heterogeneity between convolutional and attention-based encoders.

Technology Category

Application Category

📝 Abstract

Leveraging the powerful capabilities of diffusion models has yielded quite effective results in medical image segmentation tasks. However, existing methods typically transfer the original training process directly without specific adjustments for segmentation tasks. Furthermore, the commonly used pre-trained diffusion models still have deficiencies in feature extraction. Based on these considerations, we propose LEAF, a medical image segmentation model grounded in latent diffusion models. During the fine-tuning process, we replace the original noise prediction pattern with a direct prediction of the segmentation map, thereby reducing the variance of segmentation results. We also employ a feature distillation method to align the hidden states of the convolutional layers with the features from a transformer-based vision encoder. Experimental results demonstrate that our method enhances the performance of the original diffusion model across multiple segmentation datasets for different disease types. Notably, our approach does not alter the model architecture, nor does it increase the number of parameters or computation during the inference phase, making it highly efficient.

Problem

Research questions and friction points this paper is trying to address.

Improves medical image segmentation using latent diffusion models

Aligns features via efficient encoder distillation for better accuracy

Enhances performance without increasing model parameters or computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct segmentation map prediction replaces noise prediction

Feature distillation aligns convolutional and transformer features

No architecture change or added inference parameters

🔎 Similar Papers

No similar papers found.