Parameter-free representations outperform single-cell foundation models on downstream benchmarks

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
This work challenges the prevailing reliance on complex deep learning models in single-cell analysis by investigating whether parameter-free linear methods can achieve downstream performance comparable to—or even surpassing—that of state-of-the-art single-cell foundation models. We propose a linear framework grounded in rigorous normalization procedures and interpretable gene expression representations, which generates effective features without any training. Empirical evaluations demonstrate that our approach matches or approaches the performance of current best methods across multiple benchmarks, and notably outperforms Transformer-based single-cell foundation models in out-of-distribution generalization tasks, such as cross-cell-type and cross-species settings. This study provides the first compelling evidence of the substantial potential of simple linear representations in single-cell genomics.

Technology Category

Application Category

📝 Abstract
Single-cell RNA sequencing (scRNA-seq) data exhibit strong and reproducible statistical structure. This has motivated the development of large-scale foundation models, such as TranscriptFormer, that use transformer-based architectures to learn a generative model for gene expression by embedding genes into a latent vector space. These embeddings have been used to obtain state-of-the-art (SOTA) performance on downstream tasks such as cell-type classification, disease-state prediction, and cross-species learning. Here, we ask whether similar performance can be achieved without utilizing computationally intensive deep learning-based representations. Using simple, interpretable pipelines that rely on careful normalization and linear methods, we obtain SOTA or near SOTA performance across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming foundation models on out-of-distribution tasks involving novel cell types and organisms absent from the training data. Our findings highlight the need for rigorous benchmarking and suggest that the biology of cell identity can be captured by simple linear representations of single cell gene expression data.
Problem

Research questions and friction points this paper is trying to address.

single-cell RNA sequencing
foundation models
parameter-free representations
downstream benchmarks
out-of-distribution generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

parameter-free representation
single-cell RNA-seq
linear methods
foundation models
out-of-distribution generalization
🔎 Similar Papers
2024-08-22Neural Information Processing SystemsCitations: 0
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid