Cell-ontology guided transcriptome foundation model

📅 2024-08-22
🏛️ Neural Information Processing Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing transcriptomic foundation models (TFMs) treat single cells as independent samples, disregarding the hierarchical relationships among cell types in ontological atlases. This leads to gene co-expression patterns lacking biological interpretability and limits generalization capability. To address this, we propose scCello—the first single-cell foundation model that explicitly incorporates cell ontology structure during self-supervised pretraining. Our method introduces a dual-objective loss: (i) a cell-type consistency loss to preserve type-specific representation fidelity, and (ii) an ontology alignment loss that enforces topological congruence between the learned embedding space and the cell ontology graph. We further integrate graph neural networks with masked gene expression modeling. Pretrained on 22 million cells, scCello achieves state-of-the-art performance across diverse downstream tasks—including zero-shot novel cell type identification, marker gene inference, and cancer drug response prediction—outperforming all existing TFMs.

Technology Category

Application Category

📝 Abstract
Transcriptome foundation models TFMs hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present single cell, Cell-ontology guided TFM scCello. We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses.
Problem

Research questions and friction points this paper is trying to address.

Leveraging cell ontology to improve transcriptome foundation models
Enhancing gene co-expression pattern learning with cell-type relationships
Improving generalization for cell type identification and drug response prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cell-ontology guided transcriptome foundation model
Cell-type coherence and ontology alignment losses
Pre-trained on 22 million cells from CellxGene
🔎 Similar Papers
No similar papers found.