🤖 AI Summary
Deep alignment of whole-slide images (WSIs) and spatial transcriptomics (ST) data remains challenging, compounded by the difficulty of jointly modeling molecular–morphological heterogeneity. Method: We propose the first two-stage, feature-clustering-driven mixture-of-experts framework: (1) ST-guided self-supervised contrastive learning for joint cross-modal latent space modeling; (2) a data-expert mixture mechanism enabling hierarchical clustering and dynamic fusion within both morphological and molecular feature spaces; (3) construction of a co-registered WSI–ST representation space. Contribution/Results: Pretrained on the HEST-1k dataset, our model significantly outperforms existing baselines across 14 few-shot downstream tasks (average +5.2%). It achieves, for the first time, high-fidelity and interpretable molecular–morphological joint representations, establishing a novel paradigm for multimodal computational pathology.
📝 Abstract
The rapid growth of digital pathology and advances in self-supervised deep learning have enabled the development of foundational models for various pathology tasks across diverse diseases. While multimodal approaches integrating diverse data sources have emerged, a critical gap remains in the comprehensive integration of whole-slide images (WSIs) with spatial transcriptomics (ST), which is crucial for capturing critical molecular heterogeneity beyond standard hematoxylin & eosin (H&E) staining. We introduce SPADE, a foundation model that integrates histopathology with ST data to guide image representation learning within a unified framework, in effect creating an ST-informed latent space. SPADE leverages a mixture-of-data experts technique, where experts, created via two-stage feature-space clustering, use contrastive learning to learn representations of co-registered WSI patches and gene expression profiles. Pre-trained on the comprehensive HEST-1k dataset, SPADE is evaluated on 14 downstream tasks, demonstrating significantly superior few-shot performance compared to baseline models, highlighting the benefits of integrating morphological and molecular information into one latent space.