Large Language Models Meet Virtual Cell: A Survey

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of constructing a “virtual cell”—a computational system leveraging large language models (LLMs) to represent, predict, and reason about cellular states and behaviors—across three core tasks: cellular representation learning, perturbation response prediction, and gene regulatory inference. We propose a unified taxonomy distinguishing *predictive* and *agentic* LLM paradigms, and systematically survey existing models, datasets, and evaluation benchmarks. To tackle critical limitations in scalability, cross-condition generalization, and mechanistic interpretability, we integrate biological priors, scientific task orchestration, and multi-step reasoning techniques. Our contribution is threefold: (1) a foundational theoretical framework for virtual cell research; (2) a practical methodology guide grounded in domain-aware LLM design; and (3) a forward-looking roadmap for advancing LLM-driven computational biology. This work significantly accelerates the paradigm shift toward biologically grounded, reasoning-capable AI in systems biology.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are transforming cellular biology by enabling the development of "virtual cells"--computational systems that represent, predict, and reason about cellular states and behaviors. This work provides a comprehensive review of LLMs for virtual cell modeling. We propose a unified taxonomy that organizes existing methods into two paradigms: LLMs as Oracles, for direct cellular modeling, and LLMs as Agents, for orchestrating complex scientific tasks. We identify three core tasks--cellular representation, perturbation prediction, and gene regulation inference--and review their associated models, datasets, evaluation benchmarks, as well as the critical challenges in scalability, generalizability, and interpretability.
Problem

Research questions and friction points this paper is trying to address.

Developing computational systems to model cellular states
Predicting cellular behaviors using large language models
Inferring gene regulation through virtual cell paradigms
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs enable virtual cell computational systems
Two paradigms: LLMs as Oracles and Agents
Core tasks include cellular representation and prediction
🔎 Similar Papers
No similar papers found.
K
Krinos Li
Imperial College London
Xianglu Xiao
Xianglu Xiao
Imperial College London
S
Shenglong Deng
Imperial College London
L
Lucas He
University College London
Z
Zijun Zhong
Imperial College London
Y
Yuanjie Zou
New Jersey Institute of Technology
Zhonghao Zhan
Zhonghao Zhan
Cornell University
NetworksHuman-Computer InteractionData Mining
Zheng Hui
Zheng Hui
University of Cambridge
Natural Language ProcessingLLM Saftey & AlignmentMultimodal
W
Weiye Bao
Imperial College London
G
Guang Yang
Imperial College London, King’s College London, Royal Brompton Hospital