π€ AI Summary
Genomic perturbation experiments for drug discovery face challenges including high-dimensional search spaces, prohibitive experimental costs, and limited biological interpretability. To address these, we propose BioBOβa Bayesian optimization framework that integrates multimodal gene embeddings with pathway enrichment analysis, explicitly encoding biological prior knowledge into both the surrogate model and acquisition function. Our key innovation is an uncertainty-aware, pathway-guided acquisition strategy that jointly enables biologically informed perturbation design and mechanistic interpretability at the pathway level. Evaluated on multiple public functional genomics datasets, BioBO improves labeling efficiency by 25β40% over state-of-the-art baselines, significantly accelerating optimal target identification while providing interpretable, pathway-level regulatory insights.
π Abstract
Efficient design of genomic perturbation experiments is crucial for accelerating drug discovery and therapeutic target identification, yet exhaustive perturbation of the human genome remains infeasible due to the vast search space of potential genetic interactions and experimental constraints. Bayesian optimization (BO) has emerged as a powerful framework for selecting informative interventions, but existing approaches often fail to exploit domain-specific biological prior knowledge. We propose Biology-Informed Bayesian Optimization (BioBO), a method that integrates Bayesian optimization with multimodal gene embeddings and enrichment analysis, a widely used tool for gene prioritization in biology, to enhance surrogate modeling and acquisition strategies. BioBO combines biologically grounded priors with acquisition functions in a principled framework, which biases the search toward promising genes while maintaining the ability to explore uncertain regions. Through experiments on established public benchmarks and datasets, we demonstrate that BioBO improves labeling efficiency by 25-40%, and consistently outperforms conventional BO by identifying top-performing perturbations more effectively. Moreover, by incorporating enrichment analysis, BioBO yields pathway-level explanations for selected perturbations, offering mechanistic interpretability that links designs to biologically coherent regulatory circuits.