Large Language Model Agent for Modular Task Execution in Drug Discovery

📅 2025-06-25

🏛️ bioRxiv

📈 Citations: 1

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Early-stage computational drug discovery faces challenges in data retrieval, molecular generation, property prediction and optimization, and protein–ligand complex structure generation. Method: This study introduces the first modular, end-to-end framework for drug discovery built upon large language models (LLMs), uniquely integrating LLM-based reasoning with domain-specific tools—including Boltz-2 for conformational sampling, ADMET predictors, and SMILES manipulation modules—to enable iterative, multi-round molecular design and evaluation. The architecture supports flexible incorporation of new models. Contribution/Results: Applied to BCL-2–targeted drug discovery, the framework optimized 194 initial molecules over two rounds: the count of compounds with QED > 0.6 increased from 34 to 55; those satisfying ≥4 Lipinski-like drug-likeness rules rose from 29 to 52; and high-confidence 3D protein–ligand complex structures were successfully generated. This advances the intelligence, automation, and scalability of computational drug discovery.

Technology Category

Application Category

📝 Abstract

We present a modular framework powered by large language models (LLMs) that automates and streamlines key tasks across the early-stage computational drug discovery pipeline. By combining LLM reasoning with domain-specific tools, the framework performs biomedical data retrieval, domain-specific question answering, molecular generation, property prediction, property-aware molecular refinement, and 3D protein–ligand structure generation. In a case study targeting BCL-2 in lymphocytic leukemia, the agent autonomously retrieved relevant biomolecular information—including FASTA sequences, SMILES representations, and literature—and answered mechanistic questions with improved contextual accuracy over standard LLMs. It then generated chemically diverse seed molecules and predicted 67 ADMET-related properties, which guided iterative molecular refinement. Across two refinement rounds, the number of molecules with QED > 0.6 increased from 34 to 55, and those passing at least four out of five empirical drug-likeness rules rose from 29 to 52, within a pool of 194 molecules. The framework also employed Boltz-2 to generate 3D protein–ligand complexes and provide rapid binding affinity estimates for candidate compounds. These results demonstrate that the approach effectively supports molecular screening, prioritization, and structure evaluation. Its modular design enables flexible integration of evolving tools and models, providing a scalable foundation for AI-assisted therapeutic discovery.

Problem

Research questions and friction points this paper is trying to address.

Automating early-stage drug discovery pipeline using LLMs

Generating and refining molecules with predicted ADMET properties

Creating 3D protein-ligand complexes for binding affinity evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular LLM framework automates drug discovery tasks

Integrates domain tools for molecular generation and refinement

Generates 3D protein-ligand complexes with binding estimates

🔎 Similar Papers

An Autonomous Large Language Model Agent for Chemical Literature Data Mining