Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Biology lacks cross-domain, standardized AI model benchmarks, hindering model robustness and trustworthiness. To address this, we introduce the first multimodal AI benchmarking framework spanning imaging, transcriptomics, proteomics, and genomics—systematically tackling data heterogeneity, noise, bias, and resource fragmentation. Our approach integrates a high-fidelity data curation pipeline, unified preprocessing tools, biologically grounded multimodal evaluation metrics, and an open collaborative platform to enable fair, cross-task and cross-modal comparisons. A core innovation is the “virtual cell” paradigm—a biologically anchored, integrative evaluation framework—that unifies disparate modalities through shared cellular context. We further release a reproducible, extensible set of AI model evaluation guidelines. The framework significantly enhances rigor, transparency, and cross-domain comparability in biological AI research, accelerating AI-driven mechanistic discovery and therapeutic translation.

Technology Category

Application Category

📝 Abstract
Artificial intelligence holds immense promise for transforming biology, yet a lack of standardized, cross domain, benchmarks undermines our ability to build robust, trustworthy models. Here, we present insights from a recent workshop that convened machine learning and computational biology experts across imaging, transcriptomics, proteomics, and genomics to tackle this gap. We identify major technical and systemic bottlenecks such as data heterogeneity and noise, reproducibility challenges, biases, and the fragmented ecosystem of publicly available resources and propose a set of recommendations for building benchmarking frameworks that can efficiently compare ML models of biological systems across tasks and data modalities. By promoting high quality data curation, standardized tooling, comprehensive evaluation metrics, and open, collaborative platforms, we aim to accelerate the development of robust benchmarks for AI driven Virtual Cells. These benchmarks are crucial for ensuring rigor, reproducibility, and biological relevance, and will ultimately advance the field toward integrated models that drive new discoveries, therapeutic insights, and a deeper understanding of cellular systems.
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized benchmarks for AI models in biology
Challenges in data heterogeneity, noise, and reproducibility
Need for collaborative platforms to improve model evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized cross-domain benchmarking frameworks
High-quality data curation and tooling
Open collaborative platforms for evaluation
E
Elizabeth Fahsbender
Chan Zuckerberg Initiative
A
Alma Andersson
Genentech Inc
J
Jeremy Ash
Johnson & Johnson Innovative Medicine
P
Polina Binder
NVIDIA
D
Daniel Burkhardt
NVIDIA
B
Benjamin Chang
Sanger Institute
G
Georg K. Gerber
Harvard Medical School; Division of Computational Pathology, Bringham and Women’s Hospital; Massachusetts Host-Microbiome Center; Harvard-MIT Health Sciences & Technology
Anthony Gitter
Anthony Gitter
Associate Professor, University of Wisconsin-Madison; Morgridge Institute for Research
Computational biologyBioinformatics
P
Patrick Godau
German Cancer Research Center (DKFZ); National Center for Tumor Diseases (NCT); Faculty of Mathematics and Computer Science, Heidelberg University
A
Ankit Gupta
Department of Protein Science, Science for Life Laboratory, KTH Royal Institute of Technology
G
Genevieve Haliburton
Chan Zuckerberg Initiative
S
Siyu He
Department of Biomedical Data Science, Stanford
Trey Ideker
Trey Ideker
University of California San Diego
CancerSystems BiologyNetworksBioinformatics
I
Ivana Jelic
Formerly Chan Zuckerberg Initiative
A
Aly Khan
Departments of Pathology and Family Medicine, University of Chicago; Toyota Technical Institute at Chicago; Institute for Population and Precision Health, University of Chicago; Chan Zuckerberg Biohub Chicago
Y
Yang-Joon Kim
Chan Zuckerberg, Biohub San Francisco
Aditi Krishnapriyan
Aditi Krishnapriyan
Assistant Professor, UC Berkeley
Machine LearningNumerical MethodsDynamical SystemsCondensed Matter PhysicsMaterials Theory
J
Jon M. Laurent
FutureHouse
T
Tianyu Liu
Yale University
Emma Lundberg
Emma Lundberg
Associate Professor of Bioengineering and Pathology, Stanford University
Bioimagingspatial proteomics
S
Shalin B. Mehta
Chan Zuckerberg, Biohub San Francisco
R
Rob Moccia
Valid, Inc.
Angela Oliveira Pisco
Angela Oliveira Pisco
insitro
Computational BiologyData ScienceML
K
Katherine S. Pollard
Institute for Human Genetics, University of California, San Francisco; Gladstone Institutes; Department of Epidemiology and Biostatistics University of California San Francisco
S
Suresh Ramani
NVIDIA