🤖 AI Summary
Addressing the challenge of reliably predicting perturbation effects on unseen genes in single-cell perturbation prediction—and quantifying associated uncertainty—this paper introduces the Multivariate Variational Evidence Regression (MVER) framework. MVER is the first method to jointly model epistemic uncertainty (arising from lack of functional similarity among genes) and aleatoric uncertainty (stemming from data noise and batch-to-batch quality variation), leveraging a Bayesian neural network with evidence-based learning. It produces well-calibrated, multigene response predictions alongside high-fidelity confidence scores. Evaluated across multiple benchmark datasets, MVER achieves >3% average improvement in prediction accuracy over state-of-the-art methods. Moreover, it effectively identifies and filters low-confidence predictions, substantially enhancing model reliability and practical utility in real-world biological applications.
📝 Abstract
In single-cell perturbation prediction, a central task is to forecast the effects of perturbing a gene unseen in the training data. The efficacy of such predictions depends on two factors: (1) the similarity of the target gene to those covered in the training data, which informs model (epistemic) uncertainty, and (2) the quality of the corresponding training data, which reflects data (aleatoric) uncertainty. Both factors are critical for determining the reliability of a prediction, particularly as gene perturbation is an inherently stochastic biochemical process. In this paper, we propose PRESCRIBE (PREdicting Single-Cell Response wIth Bayesian Estimation), a multivariate deep evidential regression framework designed to measure both sources of uncertainty jointly. Our analysis demonstrates that PRESCRIBE effectively estimates a confidence score for each prediction, which strongly correlates with its empirical accuracy. This capability enables the filtering of untrustworthy results, and in our experiments, it achieves steady accuracy improvements of over 3% compared to comparable baselines.