BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

📅 2024-05-27
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental challenge in genetic perturbation experimental design: identifying the minimal critical gene subset required to precisely induce a target phenotype (e.g., cell growth). We propose the first closed-loop biological reasoning framework powered by a large language model (Claude 3.5 Sonnet), requiring no task-specific pretraining or handcrafted acquisition functions. Our approach integrates gene-combination prediction, multi-source literature retrieval, executable code analysis, and multi-agent cross-validation—ensuring full interpretability throughout. Evaluated across six benchmark datasets, it achieves an average 21% improvement in phenotype prediction accuracy and a 46% gain in non-essential gene identification. Gene-combination prediction accuracy exceeds that of random baselines by over 2.1×. The framework advances AI-driven, interpretable, and automated wet-lab experimental design.

Technology Category

Application Category

📝 Abstract
Agents based on large language models have shown great potential in accelerating scientific discovery by leveraging their rich background knowledge and reasoning capabilities. In this paper, we introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. We demonstrate our agent on the problem of designing genetic perturbation experiments, where the aim is to find a small subset out of many possible genes that, when perturbed, result in a specific phenotype (e.g., cell growth). Utilizing its biological knowledge, BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model or explicitly design an acquisition function as in Bayesian optimization. Moreover, BioDiscoveryAgent, using Claude 3.5 Sonnet, achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets, and a 46% improvement in the harder task of non-essential gene perturbation, compared to existing Bayesian optimization baselines specifically trained for this task. Our evaluation includes one dataset that is unpublished, ensuring it is not part of the language model's training data. Additionally, BioDiscoveryAgent predicts gene combinations to perturb more than twice as accurately as a random baseline, a task so far not explored in the context of closed-loop experiment design. The agent also has access to tools for searching the biomedical literature, executing code to analyze biological datasets, and prompting another agent to critically evaluate its predictions. Overall, BioDiscoveryAgent is interpretable at every stage, representing an accessible new paradigm in the computational design of biological experiments with the potential to augment scientists' efficacy.
Problem

Research questions and friction points this paper is trying to address.

Designs genetic perturbation experiments efficiently
Improves prediction accuracy for gene perturbations
Enhances computational design of biological experiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI designs genetic experiments without training
Improves gene perturbation prediction by 21%
Accesses biomedical tools for enhanced analysis
Y
Yusuf H. Roohani
Department of Biomedical Data Science, Stanford University
J
Jian Vora
Department of Computer Science, Stanford University
Q
Qian Huang
Department of Computer Science, Stanford University
Z
Zach Steinhart
Gladstone-UCSF Institute of Genomic Immunology
A
Alex Marson
Gladstone-UCSF Institute of Genomic Immunology, Department of Medicine, University of California, San Francisco
Percy Liang
Percy Liang
Associate Professor of Computer Science, Stanford University
machine learningnatural language processing
J
J. Leskovec
Department of Computer Science, Stanford University