SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of jointly modeling multi-scale biological information—macroscopic tissue morphology, microscopic cellular microenvironments, and gene expression profiles—in spatial transcriptomics (ST) while preserving spatial context. To this end, we propose the first multi-scale foundation model framework specifically designed for ST: (1) we construct SToCorpus-88M, a large-scale, high-resolution spatial transcriptomic corpus; (2) we introduce an SE(2) Transformer that explicitly encodes rotation- and translation-invariant spatial structures inherent to tissue sections; and (3) we develop a multi-scale sub-tile construction strategy coupled with self-supervised pretraining to unify hierarchical biological signals. Experiments demonstrate that our model achieves state-of-the-art performance on downstream tasks—including tissue region semantic segmentation and cell-type annotation—significantly outperforming existing methods. The framework establishes a generalizable, spatially aware representation foundation for spatial transcriptomics data.

Technology Category

Application Category

📝 Abstract
Spatial Transcriptomics (ST) technologies provide biologists with rich insights into single-cell biology by preserving spatial context of cells. Building foundational models for ST can significantly enhance the analysis of vast and complex data sources, unlocking new perspectives on the intricacies of biological tissues. However, modeling ST data is inherently challenging due to the need to extract multi-scale information from tissue slices containing vast numbers of cells. This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose SToFM, a multi-scale Spatial Transcriptomics Foundation Model. SToFM first performs multi-scale information extraction on each ST slice, to construct a set of ST sub-slices that aggregate macro-, micro- and gene-scale information. Then an SE(2) Transformer is used to obtain high-quality cell representations from the sub-slices. Additionally, we construct extbf{SToCorpus-88M}, the largest high-resolution spatial transcriptomics corpus for pretraining. SToFM achieves outstanding performance on a variety of downstream tasks, such as tissue region semantic segmentation and cell type annotation, demonstrating its comprehensive understanding of ST data
Problem

Research questions and friction points this paper is trying to address.

Extract multi-scale information from spatial transcriptomics data
Integrate tissue morphology, cellular microenvironment, and gene expression
Enhance analysis of complex spatial transcriptomics datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale information extraction from ST slices
SE(2) Transformer for cell representations
Pretraining with largest ST corpus SToCorpus-88M
🔎 Similar Papers
S
Suyuan Zhao
Institute for AI Industry Research (AIR), Tsinghua University
Y
Yizhen Luo
Institute for AI Industry Research (AIR), Tsinghua University
G
Ganbo Yang
Department of Computer Science and Tecnology, Tsinghua University
Y
Yan Zhong
School of Mathematical Sciences, Peking University
H
Hao Zhou
Institute for AI Industry Research (AIR), Tsinghua University
Zaiqing Nie
Zaiqing Nie
Tsinghua University
NLPData MiningMachine Learning