Large Language Model Agent for Modular Task Execution in Drug Discovery

šŸ“… 2025-06-25
šŸ›ļø bioRxiv
šŸ“ˆ Citations: 1
✨ Influential: 0
šŸ“„ PDF

career value

209K/year
šŸ¤– AI Summary
Early-stage computational drug discovery faces challenges in data retrieval, molecular generation, property prediction and optimization, and protein–ligand complex structure generation. Method: This study introduces the first modular, end-to-end framework for drug discovery built upon large language models (LLMs), uniquely integrating LLM-based reasoning with domain-specific tools—including Boltz-2 for conformational sampling, ADMET predictors, and SMILES manipulation modules—to enable iterative, multi-round molecular design and evaluation. The architecture supports flexible incorporation of new models. Contribution/Results: Applied to BCL-2–targeted drug discovery, the framework optimized 194 initial molecules over two rounds: the count of compounds with QED > 0.6 increased from 34 to 55; those satisfying ≄4 Lipinski-like drug-likeness rules rose from 29 to 52; and high-confidence 3D protein–ligand complex structures were successfully generated. This advances the intelligence, automation, and scalability of computational drug discovery.

Technology Category

Application Category

šŸ“ Abstract
We present a modular framework powered by large language models (LLMs) that automates and streamlines key tasks across the early-stage computational drug discovery pipeline. By combining LLM reasoning with domain-specific tools, the framework performs biomedical data retrieval, domain-specific question answering, molecular generation, property prediction, property-aware molecular refinement, and 3D protein–ligand structure generation. In a case study targeting BCL-2 in lymphocytic leukemia, the agent autonomously retrieved relevant biomolecular information—including FASTA sequences, SMILES representations, and literature—and answered mechanistic questions with improved contextual accuracy over standard LLMs. It then generated chemically diverse seed molecules and predicted 67 ADMET-related properties, which guided iterative molecular refinement. Across two refinement rounds, the number of molecules with QED > 0.6 increased from 34 to 55, and those passing at least four out of five empirical drug-likeness rules rose from 29 to 52, within a pool of 194 molecules. The framework also employed Boltz-2 to generate 3D protein–ligand complexes and provide rapid binding affinity estimates for candidate compounds. These results demonstrate that the approach effectively supports molecular screening, prioritization, and structure evaluation. Its modular design enables flexible integration of evolving tools and models, providing a scalable foundation for AI-assisted therapeutic discovery.
Problem

Research questions and friction points this paper is trying to address.

Automating early-stage drug discovery pipeline using LLMs
Generating and refining molecules with predicted ADMET properties
Creating 3D protein-ligand complexes for binding affinity evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular LLM framework automates drug discovery tasks
Integrates domain tools for molecular generation and refinement
Generates 3D protein-ligand complexes with binding estimates
šŸ”Ž Similar Papers
šŸ’¼ Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
Janghoon Ock
Janghoon Ock
Assistant Professor, University of Nebraska-Lincoln
Computational CatalysisMaterial DiscoveryAI4Science
R
Radheesh Sharma Meda
Department of Chemical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
Srivathsan Badrinarayanan
Srivathsan Badrinarayanan
Researcher, Carnegie Mellon University
Chemical EngineeringMachine LearningAI4Science
N
Neha S Aluru
School of Engineering Medicine, Texas A&M University, Houston, TX 77030, USA
Achuth Chandrasekhar
Achuth Chandrasekhar
Graduate Student, Carnegie Mellon University
Additive ManufacturingDeep Learning
A
A. Farimani
Department of Mechanical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA