DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

To address key challenges in weakly supervised semantic segmentation of histopathological images—including inter-class homogeneity, intra-class heterogeneity, and CAM region shrinkage—this paper proposes a text-image dual-modal prototype collaborative learning framework. We introduce learnable textual prompts (inspired by CoOp) to generate semantic prototypes, which jointly constitute a dynamic dual-prototype bank with visual prototypes. A multi-scale pyramid module is designed to mitigate excessive feature smoothing in Vision Transformers (ViTs) and enhance localization accuracy. Additionally, we integrate contrastive vision-language alignment with an improved CAM supervision strategy. Evaluated on the BCSS-WSSS benchmark, our method significantly outperforms state-of-the-art approaches. Ablation studies confirm that the diversity of textual prompts, contextual modeling capability, and complementary synergy between textual and visual prototypes are critical for segmentation performance gains.

Technology Category

Application Category

📝 Abstract

Weakly supervised semantic segmentation (WSSS) in histopathology seeks to reduce annotation cost by learning from image-level labels, yet it remains limited by inter-class homogeneity, intra-class heterogeneity, and the region-shrinkage effect of CAM-based supervision. We propose a simple and effective prototype-driven framework that leverages vision-language alignment to improve region discovery under weak supervision. Our method integrates CoOp-style learnable prompt tuning to generate text-based prototypes and combines them with learnable image prototypes, forming a dual-modal prototype bank that captures both semantic and appearance cues. To address oversmoothing in ViT representations, we incorporate a multi-scale pyramid module that enhances spatial precision and improves localization quality. Experiments on the BCSS-WSSS benchmark show that our approach surpasses existing state-of-the-art methods, and detailed analyses demonstrate the benefits of text description diversity, context length, and the complementary behavior of text and image prototypes. These results highlight the effectiveness of jointly leveraging textual semantics and visual prototype learning for WSSS in digital pathology.

Problem

Research questions and friction points this paper is trying to address.

Reduces annotation cost in histopathology segmentation

Addresses inter-class homogeneity and intra-class heterogeneity

Improves region discovery under weak supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language alignment for region discovery

Dual-modal prototype bank with text and image

Multi-scale pyramid module to enhance precision

🔎 Similar Papers

No similar papers found.