Large Language Models Meet Virtual Cell: A Survey

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of constructing a “virtual cell”—a computational system leveraging large language models (LLMs) to represent, predict, and reason about cellular states and behaviors—across three core tasks: cellular representation learning, perturbation response prediction, and gene regulatory inference. We propose a unified taxonomy distinguishing *predictive* and *agentic* LLM paradigms, and systematically survey existing models, datasets, and evaluation benchmarks. To tackle critical limitations in scalability, cross-condition generalization, and mechanistic interpretability, we integrate biological priors, scientific task orchestration, and multi-step reasoning techniques. Our contribution is threefold: (1) a foundational theoretical framework for virtual cell research; (2) a practical methodology guide grounded in domain-aware LLM design; and (3) a forward-looking roadmap for advancing LLM-driven computational biology. This work significantly accelerates the paradigm shift toward biologically grounded, reasoning-capable AI in systems biology.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are transforming cellular biology by enabling the development of "virtual cells"--computational systems that represent, predict, and reason about cellular states and behaviors. This work provides a comprehensive review of LLMs for virtual cell modeling. We propose a unified taxonomy that organizes existing methods into two paradigms: LLMs as Oracles, for direct cellular modeling, and LLMs as Agents, for orchestrating complex scientific tasks. We identify three core tasks--cellular representation, perturbation prediction, and gene regulation inference--and review their associated models, datasets, evaluation benchmarks, as well as the critical challenges in scalability, generalizability, and interpretability.

Problem

Research questions and friction points this paper is trying to address.

Developing computational systems to model cellular states

Predicting cellular behaviors using large language models

Inferring gene regulation through virtual cell paradigms

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs enable virtual cell computational systems

Two paradigms: LLMs as Oracles and Agents

Core tasks include cellular representation and prediction

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval