UniSemAlign: Text-Prototype Alignment with a Foundation Encoder for Semi-Supervised Histopathology Segmentation

๐Ÿ“… 2026-04-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of semi-supervised semantic segmentation in computational pathology, where pixel-level annotations are scarce and pseudo-labels are often unreliable. The authors propose a dual-modality semantic alignment framework that, for the first time, integrates textโ€“prototype dual semantic alignment into histopathology image segmentation. Built upon a pathology-pretrained Transformer encoder, the method jointly optimizes prototype-level and text-level alignment branches within a shared embedding space, enhanced by cross-view consistency constraints and multi-objective end-to-end training to mitigate class ambiguity and improve pseudo-label quality. Evaluated on the GlaS and CRAG datasets, the approach achieves state-of-the-art performance, yielding Dice score improvements of up to 2.6% and 8.6%, respectively, using only 10% of labeled data.

Technology Category

Application Category

๐Ÿ“ Abstract
Semi-supervised semantic segmentation in computational pathology remains challenging due to scarce pixel-level annotations and unreliable pseudo-label supervision. We propose UniSemAlign, a dual-modal semantic alignment framework that enhances visual segmentation by injecting explicit class-level structure into pixel-wise learning. Built upon a pathology-pretrained Transformer encoder, UniSemAlign introduces complementary prototype-level and text-level alignment branches in a shared embedding space, providing structured guidance that reduces class ambiguity and stabilizes pseudo-label refinement. The aligned representations are fused with visual predictions to generate more reliable supervision for unlabeled histopathology images. The framework is trained end-to-end with supervised segmentation, cross-view consistency, and cross-modal alignment objectives. Extensive experiments on the GlaS and CRAG datasets demonstrate that UniSemAlign substantially outperforms recent semi-supervised baselines under limited supervision, achieving Dice improvements of up to 2.6% on GlaS and 8.6% on CRAG with only 10% labeled data, and strong improvements at 20% supervision. Code is available at: https://github.com/thailevann/UniSemAlign
Problem

Research questions and friction points this paper is trying to address.

semi-supervised segmentation
computational pathology
pixel-level annotations
pseudo-label supervision
histopathology
Innovation

Methods, ideas, or system contributions that make the work stand out.

semi-supervised segmentation
text-prototype alignment
foundation model
histopathology
cross-modal learning
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Le-Van Thai
AI VIETNAM Lab, Vietnam
Tien Dat Nguyen
Tien Dat Nguyen
Master Student, University of Waterloo
Natural language processingMachine LearningComputer Vision
H
Hoai Nhan Pham
AI VIETNAM Lab, Vietnam
L
Lan Anh Dinh Thi
Hanoi University of Science and Technology, Vietnam
D
Duy-Dong Nguyen
AI VIETNAM Lab, Vietnam
N
Ngoc Lam Quang Bui
AI VIETNAM Lab, Vietnam