Large Language Model Agent for Modular Task Execution in Drug Discovery

šŸ“… 2025-06-25
šŸ›ļø bioRxiv
šŸ“ˆ Citations: 1
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Early-stage computational drug discovery faces challenges in data retrieval, molecular generation, property prediction and optimization, and protein–ligand complex structure generation. Method: This study introduces the first modular, end-to-end framework for drug discovery built upon large language models (LLMs), uniquely integrating LLM-based reasoning with domain-specific tools—including Boltz-2 for conformational sampling, ADMET predictors, and SMILES manipulation modules—to enable iterative, multi-round molecular design and evaluation. The architecture supports flexible incorporation of new models. Contribution/Results: Applied to BCL-2–targeted drug discovery, the framework optimized 194 initial molecules over two rounds: the count of compounds with QED > 0.6 increased from 34 to 55; those satisfying ≄4 Lipinski-like drug-likeness rules rose from 29 to 52; and high-confidence 3D protein–ligand complex structures were successfully generated. This advances the intelligence, automation, and scalability of computational drug discovery.

Technology Category

Application Category

šŸ“ Abstract
We present a modular framework powered by large language models (LLMs) that automates and streamlines key tasks across the early-stage computational drug discovery pipeline. By combining LLM reasoning with domain-specific tools, the framework performs biomedical data retrieval, domain-specific question answering, molecular generation, property prediction, property-aware molecular refinement, and 3D protein–ligand structure generation. In a case study targeting BCL-2 in lymphocytic leukemia, the agent autonomously retrieved relevant biomolecular information—including FASTA sequences, SMILES representations, and literature—and answered mechanistic questions with improved contextual accuracy over standard LLMs. It then generated chemically diverse seed molecules and predicted 67 ADMET-related properties, which guided iterative molecular refinement. Across two refinement rounds, the number of molecules with QED > 0.6 increased from 34 to 55, and those passing at least four out of five empirical drug-likeness rules rose from 29 to 52, within a pool of 194 molecules. The framework also employed Boltz-2 to generate 3D protein–ligand complexes and provide rapid binding affinity estimates for candidate compounds. These results demonstrate that the approach effectively supports molecular screening, prioritization, and structure evaluation. Its modular design enables flexible integration of evolving tools and models, providing a scalable foundation for AI-assisted therapeutic discovery.
Problem

Research questions and friction points this paper is trying to address.

Automating early-stage drug discovery pipeline using LLMs
Generating and refining molecules with predicted ADMET properties
Creating 3D protein-ligand complexes for binding affinity evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular LLM framework automates drug discovery tasks
Integrates domain tools for molecular generation and refinement
Generates 3D protein-ligand complexes with binding estimates
šŸ”Ž Similar Papers
Janghoon Ock
Janghoon Ock
Assistant Professor, University of Nebraska-Lincoln
Computational CatalysisMaterial DiscoveryAI4Science
R
Radheesh Sharma Meda
Department of Chemical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
Srivathsan Badrinarayanan
Srivathsan Badrinarayanan
Researcher, Carnegie Mellon University
Chemical EngineeringMachine LearningAI4Science
N
Neha S Aluru
School of Engineering Medicine, Texas A&M University, Houston, TX 77030, USA
Achuth Chandrasekhar
Achuth Chandrasekhar
Graduate Student, Carnegie Mellon University
Additive ManufacturingDeep Learning
A
A. Farimani
Department of Mechanical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA