DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation

๐Ÿ“… 2025-12-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address key challenges in weakly supervised semantic segmentation of histopathological imagesโ€”including inter-class homogeneity, intra-class heterogeneity, and CAM region shrinkageโ€”this paper proposes a text-image dual-modal prototype collaborative learning framework. We introduce learnable textual prompts (inspired by CoOp) to generate semantic prototypes, which jointly constitute a dynamic dual-prototype bank with visual prototypes. A multi-scale pyramid module is designed to mitigate excessive feature smoothing in Vision Transformers (ViTs) and enhance localization accuracy. Additionally, we integrate contrastive vision-language alignment with an improved CAM supervision strategy. Evaluated on the BCSS-WSSS benchmark, our method significantly outperforms state-of-the-art approaches. Ablation studies confirm that the diversity of textual prompts, contextual modeling capability, and complementary synergy between textual and visual prototypes are critical for segmentation performance gains.

Technology Category

Application Category

๐Ÿ“ Abstract
Weakly supervised semantic segmentation (WSSS) in histopathology seeks to reduce annotation cost by learning from image-level labels, yet it remains limited by inter-class homogeneity, intra-class heterogeneity, and the region-shrinkage effect of CAM-based supervision. We propose a simple and effective prototype-driven framework that leverages vision-language alignment to improve region discovery under weak supervision. Our method integrates CoOp-style learnable prompt tuning to generate text-based prototypes and combines them with learnable image prototypes, forming a dual-modal prototype bank that captures both semantic and appearance cues. To address oversmoothing in ViT representations, we incorporate a multi-scale pyramid module that enhances spatial precision and improves localization quality. Experiments on the BCSS-WSSS benchmark show that our approach surpasses existing state-of-the-art methods, and detailed analyses demonstrate the benefits of text description diversity, context length, and the complementary behavior of text and image prototypes. These results highlight the effectiveness of jointly leveraging textual semantics and visual prototype learning for WSSS in digital pathology.
Problem

Research questions and friction points this paper is trying to address.

Reduces annotation cost in histopathology segmentation
Addresses inter-class homogeneity and intra-class heterogeneity
Improves region discovery under weak supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language alignment for region discovery
Dual-modal prototype bank with text and image
Multi-scale pyramid module to enhance precision
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Anh M. Vu
University of Houston, Houston, TX, USA
K
Khang P. Le
Ho Chi Minh City University of Technology, Vietnam
T
Trang T. K. Vo
University of Information Technology, Ho Chi Minh City, Vietnam
H
Ha Thach
University of Technology Sydney, Australia
H
Huy Hung Nguyen
Vin University, Hanoi, Vietnam
D
David Yang
Department of Computer Science, Emory University, Atlanta, GA, USA
H
Han H. Huynh
College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
Q
Quynh Nguyen
University of Houston, Houston, TX, USA
T
Tuan M. Pham
University of Houston, Houston, TX, USA
T
Tuan-Anh Le
University of Houston, Houston, TX, USA
M
Minh H. N. Le
Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, NY, USA
Thanh-Huy Nguyen
Thanh-Huy Nguyen
Carnegie Mellon University
Medical Image Analysis๐—–๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ๐—ฟ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ปSemi-Supervised Learning
Akash Awasthi
Akash Awasthi
Machine Learning Researcher, University of Houston/BAERI/NASA Ames Research Center
Large Multimodal ModelsScientific Machine learning
C
Chandra Mohan
University of Houston, Houston, TX, USA
Z
Zhu Han
University of Houston, Houston, TX, USA
Hien Van Nguyen
Hien Van Nguyen
Associate Professor, University of Houston
Machine LearningArtificial IntelligenceComputer VisionMedical Image Analysis