🤖 AI Summary
This study addresses the challenge of jointly modeling multi-scale biological information—macroscopic tissue morphology, microscopic cellular microenvironments, and gene expression profiles—in spatial transcriptomics (ST) while preserving spatial context. To this end, we propose the first multi-scale foundation model framework specifically designed for ST: (1) we construct SToCorpus-88M, a large-scale, high-resolution spatial transcriptomic corpus; (2) we introduce an SE(2) Transformer that explicitly encodes rotation- and translation-invariant spatial structures inherent to tissue sections; and (3) we develop a multi-scale sub-tile construction strategy coupled with self-supervised pretraining to unify hierarchical biological signals. Experiments demonstrate that our model achieves state-of-the-art performance on downstream tasks—including tissue region semantic segmentation and cell-type annotation—significantly outperforming existing methods. The framework establishes a generalizable, spatially aware representation foundation for spatial transcriptomics data.
📝 Abstract
Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct extbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data