🤖 AI Summary
Current AI-based virtual cells (AIVCs) face multiple challenges: poor interoperability across laboratories and platforms; data splits prone to leakage and coverage bias; unsystematic modeling of dose–time–combination effects; weak multiscale coupling among molecular, cellular, and tissue levels; and misalignment with clinical endpoints. To address these, we propose a model-agnostic Cell State Latent (CSL) framework that unifies measurements, cross-scale mappings, and interventions via operator syntax, and establishes a decision-oriented functional space evaluation blueprint. Our methodology integrates single-cell and spatial foundation models, cross-modal alignment, perturbation atlas expansion, pathway activity inference, and anti-leakage data splitting. Experiments demonstrate substantial improvements in cross-platform reproducibility, systematic multi-effect modeling capability, and predictive performance for clinically relevant endpoints.
📝 Abstract
Artificial Intelligence Virtual Cells (AIVCs) aim to learn executable, decision-relevant models of cell state from multimodal, multiscale measurements. Recent studies have introduced single-cell and spatial foundation models, improved cross-modality alignment, scaled perturbation atlases, and explored pathway-level readouts. Nevertheless, although held-out validation is standard practice, evaluations remain predominantly within single datasets and settings; evidence indicates that transport across laboratories and platforms is often limited, that some data splits are vulnerable to leakage and coverage bias, and that dose, time and combination effects are not yet systematically handled. Cross-scale coupling also remains constrained, as anchors linking molecular, cellular and tissue levels are sparse, and alignment to scientific or clinical readouts varies across studies. We propose a model-agnostic Cell-State Latent (CSL) perspective that organizes learning via an operator grammar: measurement, lift/project for cross-scale coupling, and intervention for dosing and scheduling. This view motivates a decision-aligned evaluation blueprint across modality, scale, context and intervention, and emphasizes function-space readouts such as pathway activity, spatial neighborhoods and clinically relevant endpoints. We recommend operator-aware data design, leakage-resistant partitions, and transparent calibration and reporting to enable reproducible, like-for-like comparisons.