HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing spatial transcriptomics models decouple spatial architecture from gene regulation, overlooking the dynamic influence of the microenvironment on multi-level gene regulatory processes. To address this, we propose the first hierarchical graph-based foundation model for spatial multi-omics (transcriptomics/proteomics): it represents tissue as a cell-neighborhood graph and individual cells as gene regulatory network graphs, jointly encoding microenvironmental context and molecular regulation via a cross-level Graph Transformer. We innovatively design a dual-level graph structural framework and introduce a joint pretraining paradigm integrating spatial-aware contrastive learning and masked graph autoencoding. Pretrained on 22.3 million cells across 124 tissues and 15 organs, our model significantly improves performance in clinical prognosis prediction, cell-type annotation, gene imputation, and spatial clustering. Moreover, it uncovers spatially distinct microenvironmental subpopulations undetectable by conventional methods.

Technology Category

Application Category

📝 Abstract
Single-cell transcriptomics has become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and transcriptional regulation at the single-cell level. With the advent of spatial transcriptomics data we have the promise of learning about cells within a tissue context as it provides both spatial coordinates and transcriptomic readouts. However, existing models either ignore spatial resolution or the gene regulatory information. Gene regulation in cells can change depending on microenvironmental cues from neighboring cells, but existing models neglect gene regulatory patterns with hierarchical dependencies across levels of abstraction. In order to create contextualized representations of cells and genes from spatial transcriptomics data, we introduce HEIST, a hierarchical graph transformer-based foundation model for spatial transcriptomics and proteomics data. HEIST models tissue as spatial cellular neighborhood graphs, and each cell is, in turn, modeled as a gene regulatory network graph. The framework includes a hierarchical graph transformer that performs cross-level message passing and message passing within levels. HEIST is pre-trained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive learning and masked auto-encoding objectives. Unsupervised analysis of HEIST representations of cells, shows that it effectively encodes the microenvironmental influences in cell embeddings, enabling the discovery of spatially-informed subpopulations that prior models fail to differentiate. Further, HEIST achieves state-of-the-art results on four downstream task such as clinical outcome prediction, cell type annotation, gene imputation, and spatially-informed cell clustering across multiple technologies, highlighting the importance of hierarchical modeling and GRN-based representations.
Problem

Research questions and friction points this paper is trying to address.

Modeling spatial and gene regulatory data in transcriptomics
Capturing hierarchical dependencies in gene regulation
Improving cell subpopulation discovery and downstream tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical graph transformer for spatial data
Spatial cellular neighborhood graphs modeling
Contrastive learning and masked auto-encoding
🔎 Similar Papers
No similar papers found.