Contrastive Learning with Diffusion Features for Weakly Supervised Medical Image Segmentation

📅 2025-06-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In weakly supervised semantic segmentation (WSSS) of medical images, class activation maps (CAMs) suffer from severe localization bias and blurred boundaries, while conditional diffusion models (CDMs) often generate saliency maps contaminated by background noise. To address these issues, this paper proposes a novel segmentation framework that integrates a frozen CDM with pixel-wise contrastive learning. Specifically, the CDM is leveraged to extract robust feature representations; a pixel embedding space is constructed by jointly incorporating external classifier gradient maps and CAMs; and a contrastive learning–based decoder is designed to enhance foreground-background discrimination. Evaluated on four segmentation tasks across two public medical image datasets, the method consistently outperforms state-of-the-art WSSS baselines, achieving significant improvements in both segmentation accuracy and boundary delineation. These results validate the effectiveness of synergistically modeling diffusion priors and contrastive learning for weakly supervised medical image segmentation.

Technology Category

Application Category

📝 Abstract
Weakly supervised semantic segmentation (WSSS) methods using class labels often rely on class activation maps (CAMs) to localize objects. However, traditional CAM-based methods struggle with partial activations and imprecise object boundaries due to optimization discrepancies between classification and segmentation. Recently, the conditional diffusion model (CDM) has been used as an alternative for generating segmentation masks in WSSS, leveraging its strong image generation capabilities tailored to specific class distributions. By modifying or perturbing the condition during diffusion sampling, the related objects can be highlighted in the generated images. Yet, the saliency maps generated by CDMs are prone to noise from background alterations during reverse diffusion. To alleviate the problem, we introduce Contrastive Learning with Diffusion Features (CLDF), a novel method that uses contrastive learning to train a pixel decoder to map the diffusion features from a frozen CDM to a low-dimensional embedding space for segmentation. Specifically, we integrate gradient maps generated from CDM external classifier with CAMs to identify foreground and background pixels with fewer false positives/negatives for contrastive learning, enabling robust pixel embedding learning. Experimental results on four segmentation tasks from two public medical datasets demonstrate that our method significantly outperforms existing baselines.
Problem

Research questions and friction points this paper is trying to address.

Enhance weakly supervised medical image segmentation accuracy
Reduce noise in saliency maps from diffusion models
Improve object boundary precision in segmentation masks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses contrastive learning with diffusion features
Integrates gradient maps and CAMs for accuracy
Trains pixel decoder for robust segmentation
🔎 Similar Papers
No similar papers found.
D
Dewen Zeng
University of Notre Dame, Notre Dame, IN, USA
X
Xinrong Hu
University of Notre Dame, Notre Dame, IN, USA
Y
Yu-Jen Chen
National Tsing Hua University, Taiwan
Yawen Wu
Yawen Wu
Applied Scientist at Amazon AWS AI
Large Language ModelsEfficient Machine Learning
X
Xiaowei Xu
Guangdong Provincial People’s Hospital, Guangzhou, China
Yiyu Shi
Yiyu Shi
Full Professor, University of Notre Dame
hardware/software co-designdeep learning accelerationon-device AIAI for healthcare