π€ AI Summary
Current pathological foundation models lack deep integration with single-cell molecular data, limiting their utility in precision oncology. To address this, we propose the first cross-modal pan-cancer single-cell foundation model, enabling unified representation learning of tissue histopathology images and single-cell transcriptomes at cellular resolution. Our method employs a multimodal deep neural network to jointly encode 20 million imageβgene expression pairs, integrating contrastive learning and cross-modal attention mechanisms for robust modality alignment. The model supports direct prediction of single-cell gene expression from routine H&E-stained whole-slide images, generation of virtual molecular staining maps, and multimodal survival analysis. It significantly outperforms state-of-the-art methods across multiple cancer types, demonstrating strong generalizability and label-free predictive capability. This work establishes a novel paradigm for high-resolution spatial omics and mechanistic investigation of tumorigenesis.
π Abstract
While pathology foundation models have transformed cancer image analysis, they often lack integration with molecular data at single-cell resolution, limiting their utility for precision oncology. Here, we present PAST, a pan-cancer single-cell foundation model trained on 20 million paired histopathology images and single-cell transcriptomes spanning multiple tumor types and tissue contexts. By jointly encoding cellular morphology and gene expression, PAST learns unified cross-modal representations that capture both spatial and molecular heterogeneity at the cellular level. This approach enables accurate prediction of single-cell gene expression, virtual molecular staining, and multimodal survival analysis directly from routine pathology slides. Across diverse cancers and downstream tasks, PAST consistently exceeds the performance of existing approaches, demonstrating robust generalizability and scalability. Our work establishes a new paradigm for pathology foundation models, providing a versatile tool for high-resolution spatial omics, mechanistic discovery, and precision cancer research.