ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology Segmentation

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

To address incomplete pseudo-mask localization and poor cross-tissue semantic consistency in weakly supervised semantic segmentation (WSSS) of histopathological images, this paper proposes a text-guided prototype learning framework. Methodologically, it is the first to integrate CONCH’s morphology-aware vision-language representations with SegFormer’s multi-scale spatial architecture, introducing text-conditioned prototype initialization and structured knowledge distillation to jointly optimize semantic discriminability and spatial coherence without pixel-level annotations. Technically, it adopts a frozen backbone plus lightweight adapter paradigm. Evaluated on the BCSS-WSSS dataset, the method significantly outperforms existing WSSS approaches: it yields more complete pseudo-masks, achieves superior cross-tissue semantic consistency, and maintains high computational efficiency.

Technology Category

Application Category

📝 Abstract

Weakly supervised semantic segmentation (WSSS) in histopathology relies heavily on classification backbones, yet these models often localize only the most discriminative regions and struggle to capture the full spatial extent of tissue structures. Vision-language models such as CONCH offer rich semantic alignment and morphology-aware representations, while modern segmentation backbones like SegFormer preserve fine-grained spatial cues. However, combining these complementary strengths remains challenging, especially under weak supervision and without dense annotations. We propose a prototype learning framework for WSSS in histopathological images that integrates morphology-aware representations from CONCH, multi-scale structural cues from SegFormer, and text-guided semantic alignment to produce prototypes that are simultaneously semantically discriminative and spatially coherent. To effectively leverage these heterogeneous sources, we introduce text-guided prototype initialization that incorporates pathology descriptions to generate more complete and semantically accurate pseudo-masks. A structural distillation mechanism transfers spatial knowledge from SegFormer to preserve fine-grained morphological patterns and local tissue boundaries during prototype learning. Our approach produces high-quality pseudo masks without pixel-level annotations, improves localization completeness, and enhances semantic consistency across tissue types. Experiments on BCSS-WSSS datasets demonstrate that our prototype learning framework outperforms existing WSSS methods while remaining computationally efficient through frozen foundation model backbones and lightweight trainable adapters.

Problem

Research questions and friction points this paper is trying to address.

Develops prototype learning for weakly supervised histopathology segmentation

Integrates morphology-aware and structural cues without dense annotations

Enhances semantic consistency and localization completeness in tissue segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates vision-language and segmentation models for prototype learning

Uses text-guided initialization for semantic accuracy in pseudo-masks

Applies structural distillation to preserve fine-grained spatial patterns

🔎 Similar Papers

No similar papers found.