🤖 AI Summary
Current virtual cell models rely heavily on large-scale single-cell data, limiting their generalizability due to data quality issues and batch effects; moreover, they are predominantly black-box systems, lacking interpretability and biological consistency. To address these limitations, we propose the first biologically grounded world model framework for cellular response prediction. Our approach integrates knowledge graphs, chain-of-causal reasoning from large language models, and dynamic simulation of signaling pathways to construct an interpretable, data-efficient white-box cellular simulator. The model enables stepwise mechanistic inference and hypothesis generation under molecular perturbations. In drug perturbation prediction tasks, it achieves state-of-the-art performance; inferred signaling pathways exhibit strong concordance with established biological evidence. This advances the scientific credibility and mechanistic interpretability of virtual cell models, enabling rigorous, hypothesis-driven discovery in systems pharmacology and cell biology.
📝 Abstract
Virtual cell modeling aims to predict cellular responses to perturbations. Existing virtual cell models rely heavily on large-scale single-cell datasets, learning explicit mappings between gene expression and perturbations. Although recent models attempt to incorporate multi-source biological information, their generalization remains constrained by data quality, coverage, and batch effects. More critically, these models often function as black boxes, offering predictions without interpretability or consistency with biological principles, which undermines their credibility in scientific research. To address these challenges, we present VCWorld, a cell-level white-box simulator that integrates structured biological knowledge with the iterative reasoning capabilities of large language models to instantiate a biological world model. VCWorld operates in a data-efficient manner to reproduce perturbation-induced signaling cascades and generates interpretable, stepwise predictions alongside explicit mechanistic hypotheses. In drug perturbation benchmarks, VCWorld achieves state-of-the-art predictive performance, and the inferred mechanistic pathways are consistent with publicly available biological evidence.