ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology Segmentation

πŸ“… 2025-12-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address incomplete pseudo-mask localization and poor cross-tissue semantic consistency in weakly supervised semantic segmentation (WSSS) of histopathological images, this paper proposes a text-guided prototype learning framework. Methodologically, it is the first to integrate CONCH’s morphology-aware vision-language representations with SegFormer’s multi-scale spatial architecture, introducing text-conditioned prototype initialization and structured knowledge distillation to jointly optimize semantic discriminability and spatial coherence without pixel-level annotations. Technically, it adopts a frozen backbone plus lightweight adapter paradigm. Evaluated on the BCSS-WSSS dataset, the method significantly outperforms existing WSSS approaches: it yields more complete pseudo-masks, achieves superior cross-tissue semantic consistency, and maintains high computational efficiency.

Technology Category

Application Category

πŸ“ Abstract
Weakly supervised semantic segmentation (WSSS) in histopathology relies heavily on classification backbones, yet these models often localize only the most discriminative regions and struggle to capture the full spatial extent of tissue structures. Vision-language models such as CONCH offer rich semantic alignment and morphology-aware representations, while modern segmentation backbones like SegFormer preserve fine-grained spatial cues. However, combining these complementary strengths remains challenging, especially under weak supervision and without dense annotations. We propose a prototype learning framework for WSSS in histopathological images that integrates morphology-aware representations from CONCH, multi-scale structural cues from SegFormer, and text-guided semantic alignment to produce prototypes that are simultaneously semantically discriminative and spatially coherent. To effectively leverage these heterogeneous sources, we introduce text-guided prototype initialization that incorporates pathology descriptions to generate more complete and semantically accurate pseudo-masks. A structural distillation mechanism transfers spatial knowledge from SegFormer to preserve fine-grained morphological patterns and local tissue boundaries during prototype learning. Our approach produces high-quality pseudo masks without pixel-level annotations, improves localization completeness, and enhances semantic consistency across tissue types. Experiments on BCSS-WSSS datasets demonstrate that our prototype learning framework outperforms existing WSSS methods while remaining computationally efficient through frozen foundation model backbones and lightweight trainable adapters.
Problem

Research questions and friction points this paper is trying to address.

Develops prototype learning for weakly supervised histopathology segmentation
Integrates morphology-aware and structural cues without dense annotations
Enhances semantic consistency and localization completeness in tissue segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates vision-language and segmentation models for prototype learning
Uses text-guided initialization for semantic accuracy in pseudo-masks
Applies structural distillation to preserve fine-grained spatial patterns
πŸ”Ž Similar Papers
No similar papers found.
Khang Le
Khang Le
Ho Chi Minh City University of Technology, Vietnam
H
Ha Thach
University of Technology Sydney, Australia
A
Anh M. Vu
University of Houston, Houston, TX, USA
T
Trang T. K. Vo
University of Information Technology, Ho Chi Minh City, Vietnam
H
Han H. Huynh
College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
D
David Yang
Department of Computer Science, Emory University, Atlanta, GA, USA
M
Minh H. N. Le
Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, NY, USA
Thanh-Huy Nguyen
Thanh-Huy Nguyen
Carnegie Mellon University
Medical Image Analysisπ—–π—Όπ—Ίπ—½π˜‚π˜π—²π—Ώ π—©π—Άπ˜€π—Άπ—Όπ—»Semi-Supervised Learning
Akash Awasthi
Akash Awasthi
Machine Learning Researcher, University of Houston/BAERI/NASA Ames Research Center
Large Multimodal ModelsScientific Machine learning
C
Chandra Mohan
University of Houston, Houston, TX, USA
Z
Zhu Han
University of Houston, Houston, TX, USA
Hien Van Nguyen
Hien Van Nguyen
Associate Professor, University of Houston
Machine LearningArtificial IntelligenceComputer VisionMedical Image Analysis