Cell-JEPA: Latent Representation Learning for Single-Cell Transcriptomics

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
This work addresses the challenge posed by pervasive dropout noise—exceeding 90% in single-cell transcriptomic data—which causes existing models to learn technical artifacts rather than stable biological programs under reconstruction objectives. To overcome this, the authors propose Cell-JEPA, the first method to adapt the Joint Embedding Predictive Architecture (JEPA) to single-cell modeling. By predicting complete cell embeddings from partially observed inputs in a latent space, Cell-JEPA avoids direct reconstruction of sparse, noisy expression counts and instead leverages gene redundancy to learn representations robust to dropout. The model achieves an AvgBIO score of 0.72 on zero-shot cell-type clustering, a 36% improvement over scGPT, and enhances the accuracy of cellular state reconstruction in perturbation response prediction tasks.

Technology Category

Application Category

📝 Abstract
Single-cell foundation models learn by reconstructing masked gene expression, implicitly treating technical noise as signal. With dropout rates exceeding 90%, reconstruction objectives encourage models to encode measurement artifacts rather than stable cellular programs. We introduce Cell-JEPA, a joint-embedding predictive architecture that shifts learning from reconstructing sparse counts to predicting in latent space. The key insight is that cell identity is redundantly encoded across genes. We show predicting cell-level embeddings from partial observations forces the model to learn dropout-robust features. On cell-type clustering, Cell-JEPA achieves 0.72 AvgBIO in zero-shot transfer versus 0.53 for scGPT, a 36% relative improvement. On perturbation prediction within a single cell line, Cell-JEPA improves absolute-state reconstruction but not effect-size estimation, suggesting that representation learning and perturbation modeling address complementary aspects of cellular prediction.
Problem

Research questions and friction points this paper is trying to address.

single-cell transcriptomics
technical noise
dropout
representation learning
foundation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cell-JEPA
joint-embedding predictive architecture
latent representation learning
dropout-robust features
single-cell transcriptomics
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Postdoctoral Fellow – AI/ML Enabled Bioprocess Modeling and Control
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Andover
A
Ali ElSheikh
Northwestern University
R
Rui-Xi Wang
Massachusetts Institute of Technology
Weimin Wu
Weimin Wu
Ph.D. Candidate in Computer Science, Northwestern University
AI for BiologyML Theory
Y
Yibo Wen
Northwestern University
P
Payam Dibaeinia
Biohub
J
Jennifer Yuntong Zhang
University of Toronto
J
J. Hu
Northwestern University
M
Mei Knudson
Biohub
S
Sudarshan Babu
Biohub
Shao-Hua Sun
Shao-Hua Sun
Assistant Professor at National Taiwan University
Machine LearningRobot LearningReinforcement LearningProgram Synthesis
A
Aly A. Khan
Biohub
Han Liu
Han Liu
Orrington Lunt Professor of Computer Science, Statistics and Data Science, Northwestern University
Machine LearningLarge Foundation Models for AIAI for Science and Finance