TEDDY: A Family Of Foundation Models For Understanding Single Cell Biology

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
Current single-cell foundation models underperform specialized models on downstream tasks, hindering disease mechanism dissection and drug discovery. To address this, we propose a family of general-purpose foundation models tailored for single-cell biological understanding. Our approach is the first to integrate ultra-large-scale single-cell data (116 million cells) with biologically informed, annotation-guided supervised pretraining, enabling systematic characterization of predictable performance gains from data volume and model parameter count. Leveraging a Transformer architecture, we develop six multi-scale models (70M–400M parameters) and introduce cell-level phenotypic annotations to enrich pretraining objectives. On unseen donor disease-state identification—a key clinical challenge—our models substantially outperform state-of-the-art methods. Moreover, they achieve robust improvements in critical tasks such as healthy versus diseased cell classification, significantly enhancing cross-individual and cross-state generalization.

Technology Category

Application Category

📝 Abstract
Understanding the biological mechanism of disease is critical for medicine, and in particular drug discovery. AI-powered analysis of genome-scale biological data hold great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models either do not improve or only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving the state-of-the-art. First, we scaled the pre-training dataset to 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the TEDDY family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on two downstream evaluation tasks -- identifying the underlying disease state of held-out donors not seen during training and distinguishing healthy cells from diseased ones for disease conditions and donors not seen during training. Scaling experiments showed that performance improved predictably with both data volume and parameter count. Our models showed substantial improvement over existing work on the first task and more muted improvements on the second.
Problem

Research questions and friction points this paper is trying to address.

Improving single-cell biology understanding via large-scale foundation models.
Enhancing disease state identification using scaled pre-training datasets.
Leveraging biological annotations for better model supervision and performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaled pre-training dataset to 116 million cells
Leveraged large-scale biological annotations for supervision
Trained transformer-based models with up to 400M parameters
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid