π€ AI Summary
Weakly supervised semantic segmentation (WSSS) in histopathology faces three key challenges: inter-class homogeneity, intra-class heterogeneity, and CAM region shrinkage. Existing two-stage approaches rely on clustering to construct prototype banks, suffering from high computational overhead, sensitivity to hyperparameters, and decoupled prototype learning and segmentation optimization. This paper proposes the first end-to-end learnable prototype framework: it eliminates explicit clustering and jointly optimizes learnable class prototypes and the segmentation head in a single stage; introduces a diversity regularization term to enhance prototype coverage of morphological variations; and achieves pixel-level supervision using only image-level labels. Evaluated on the BCSS-WSSS benchmark, our method establishes new state-of-the-art performance, achieving superior mIoU and mDice scores. Qualitatively, predicted boundaries are sharper and missegmentations are significantly reduced.
π Abstract
Weakly supervised semantic segmentation (WSSS) in histopathology reduces pixel-level labeling by learning from image-level labels, but it is hindered by inter-class homogeneity, intra-class heterogeneity, and CAM-induced region shrinkage (global pooling-based class activation maps whose activations highlight only the most distinctive areas and miss nearby class regions). Recent works address these challenges by constructing a clustering prototype bank and then refining masks in a separate stage; however, such two-stage pipelines are costly, sensitive to hyperparameters, and decouple prototype discovery from segmentation learning, limiting their effectiveness and efficiency. We propose a cluster-free, one-stage learnable-prototype framework with diversity regularization to enhance morphological intra-class heterogeneity coverage. Our approach achieves state-of-the-art (SOTA) performance on BCSS-WSSS, outperforming prior methods in mIoU and mDice. Qualitative segmentation maps show sharper boundaries and fewer mislabels, and activation heatmaps further reveal that, compared with clustering-based prototypes, our learnable prototypes cover more diverse and complementary regions within each class, providing consistent qualitative evidence for their effectiveness.