🤖 AI Summary
This work addresses the challenge posed by pervasive dropout noise—exceeding 90% in single-cell transcriptomic data—which causes existing models to learn technical artifacts rather than stable biological programs under reconstruction objectives. To overcome this, the authors propose Cell-JEPA, the first method to adapt the Joint Embedding Predictive Architecture (JEPA) to single-cell modeling. By predicting complete cell embeddings from partially observed inputs in a latent space, Cell-JEPA avoids direct reconstruction of sparse, noisy expression counts and instead leverages gene redundancy to learn representations robust to dropout. The model achieves an AvgBIO score of 0.72 on zero-shot cell-type clustering, a 36% improvement over scGPT, and enhances the accuracy of cellular state reconstruction in perturbation response prediction tasks.
📝 Abstract
Single-cell foundation models learn by reconstructing masked gene expression, implicitly treating technical noise as signal. With dropout rates exceeding 90%, reconstruction objectives encourage models to encode measurement artifacts rather than stable cellular programs. We introduce Cell-JEPA, a joint-embedding predictive architecture that shifts learning from reconstructing sparse counts to predicting in latent space. The key insight is that cell identity is redundantly encoded across genes. We show predicting cell-level embeddings from partial observations forces the model to learn dropout-robust features. On cell-type clustering, Cell-JEPA achieves 0.72 AvgBIO in zero-shot transfer versus 0.53 for scGPT, a 36% relative improvement. On perturbation prediction within a single cell line, Cell-JEPA improves absolute-state reconstruction but not effect-size estimation, suggesting that representation learning and perturbation modeling address complementary aspects of cellular prediction.