Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of modeling single-cell transcriptomic state distributions and predicting genome-wide responses under perturbations to enable generative simulation of virtual cells. We propose the first masked discrete diffusion model tailored for single-cell transcriptomics, which directly learns dependencies among approximately 18,000 genes in a discrete token space without requiring prior gene selection. The model supports conditional generation through joint embeddings of cell type, donor identity, and perturbation conditions. It accurately recapitulates transcriptomic distributions, marker gene expression, and cellular subtype proportions across diverse tissues and species, achieving state-of-the-art performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine responses in human PBMCs.
📝 Abstract
Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.
Problem

Research questions and friction points this paper is trying to address.

virtual cells
cellular state modeling
transcriptome simulation
perturbation response
single-cell transcriptomics
Innovation

Methods, ideas, or system contributions that make the work stand out.

masked discrete diffusion
generative cellular world model
single-cell transcriptomics
perturbation response prediction
in silico simulation
🔎 Similar Papers
No similar papers found.