🤖 AI Summary
Lack of standardized evaluation protocols for single-cell perturbation effect prediction hinders model comparability and biological interpretability. To address this, we introduce ScPEBench—the first standardized benchmark framework—integrating diverse CRISPR-based and drug perturbation datasets, an accessible benchmarking platform, and a multi-dimensional evaluation suite balancing reconstruction accuracy (e.g., RMSE) and ranking fidelity (e.g., rank-based consistency). Through systematic, reproducible integration testing, we uncover pervasive issues—including mode collapse—and demonstrate that simple models often outperform complex ones, underscoring the critical importance of ranking metrics alongside conventional error measures. ScPEBench establishes a new evaluation paradigm, significantly enhancing model robustness and cross-study comparability. It provides a rigorous, reproducible foundation for genetic and chemical screening–driven target discovery.
📝 Abstract
We present a comprehensive framework for predicting the effects of perturbations in single cells, designed to standardize benchmarking in this rapidly evolving field. Our framework, PerturBench, includes a user-friendly platform, diverse datasets, metrics for fair model comparison, and detailed performance analysis. Extensive evaluations of published and baseline models reveal limitations like mode or posterior collapse, and underscore the importance of rank metrics that assess the ordering of perturbations alongside traditional measures like RMSE. Our findings show that simple models can outperform more complex approaches. This benchmarking exercise sets new standards for model evaluation, supports robust model development, and advances the potential of these models to use high-throughput and high-content genetic and chemical screens for disease target discovery.