🤖 AI Summary
This work addresses the challenge in virtual cell modeling where prediction failures are often difficult to trace back to their root causes—such as assumptions, representations, implementations, or constraints—leading to inefficient model iteration. To overcome this, the authors propose a dual-space hierarchical framework that couples a high-level hypothesis space with a low-level executable implementation space. Modeling decisions are encoded as structured states, and a cross-level feedback mechanism translates prediction discrepancies into traceable correction signals, establishing a closed-loop optimization process: hypothesis → implementation → hypothesis refinement. The approach enables constraint-aware admissible program generation and audit-driven model refinement, significantly outperforming baseline methods across morphological, transcriptomic, and single-cell perturbation tasks while producing interpretable optimization trajectories.
📝 Abstract
Virtual Cell Modeling (VCM) requires models that not only predict perturbation responses, but also support targeted revision when predictions fail. Current LLM-assisted modeling workflows face a refinement-routing problem: prediction discrepancies are observed through executable implementations, but the relevant revision may involve the modeling assumption, representation design, implementation, or task constraint. Without structured feedback propagation across these levels, iterative refinement may repair code while failing to revise the assumption responsible for the discrepancy. We propose CellScientist, a dual-space hierarchical framework that couples a high-level hypothesis space with a low-level executable implementation space. CellScientist represents modeling decisions as structured states, realizes them as admissible programs under task and interface constraints, and routes execution discrepancies back to targeted hypothesis or implementation updates. This enables a closed Hypothesis -> Implementation -> Hypothesis loop where failures become structured signals for model refinement rather than debugging events. Across morphology and transcriptomic benchmarks, with additional single-cell perturbation evaluations, the final executable models selected by CellScientist improve over reference baselines under fixed split and evaluation protocols, while the workflow produces auditable refinement traces.