Contrastive Learning with Diffusion Features for Weakly Supervised Medical Image Segmentation

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

In weakly supervised semantic segmentation (WSSS) of medical images, class activation maps (CAMs) suffer from severe localization bias and blurred boundaries, while conditional diffusion models (CDMs) often generate saliency maps contaminated by background noise. To address these issues, this paper proposes a novel segmentation framework that integrates a frozen CDM with pixel-wise contrastive learning. Specifically, the CDM is leveraged to extract robust feature representations; a pixel embedding space is constructed by jointly incorporating external classifier gradient maps and CAMs; and a contrastive learning–based decoder is designed to enhance foreground-background discrimination. Evaluated on four segmentation tasks across two public medical image datasets, the method consistently outperforms state-of-the-art WSSS baselines, achieving significant improvements in both segmentation accuracy and boundary delineation. These results validate the effectiveness of synergistically modeling diffusion priors and contrastive learning for weakly supervised medical image segmentation.

Technology Category

Application Category

📝 Abstract

Weakly supervised semantic segmentation (WSSS) methods using class labels often rely on class activation maps (CAMs) to localize objects. However, traditional CAM-based methods struggle with partial activations and imprecise object boundaries due to optimization discrepancies between classification and segmentation. Recently, the conditional diffusion model (CDM) has been used as an alternative for generating segmentation masks in WSSS, leveraging its strong image generation capabilities tailored to specific class distributions. By modifying or perturbing the condition during diffusion sampling, the related objects can be highlighted in the generated images. Yet, the saliency maps generated by CDMs are prone to noise from background alterations during reverse diffusion. To alleviate the problem, we introduce Contrastive Learning with Diffusion Features (CLDF), a novel method that uses contrastive learning to train a pixel decoder to map the diffusion features from a frozen CDM to a low-dimensional embedding space for segmentation. Specifically, we integrate gradient maps generated from CDM external classifier with CAMs to identify foreground and background pixels with fewer false positives/negatives for contrastive learning, enabling robust pixel embedding learning. Experimental results on four segmentation tasks from two public medical datasets demonstrate that our method significantly outperforms existing baselines.

Problem

Research questions and friction points this paper is trying to address.

Enhance weakly supervised medical image segmentation accuracy

Reduce noise in saliency maps from diffusion models

Improve object boundary precision in segmentation masks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses contrastive learning with diffusion features

Integrates gradient maps and CAMs for accuracy

Trains pixel decoder for robust segmentation

🔎 Similar Papers

No similar papers found.