Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of large language models in structure-based drug design, which often stem from inadequate understanding of protein structures and insufficient control over molecular generation. The authors propose an exploration-augmented latent reasoning framework that decouples the generation process into three stages: encoding, latent space exploration, and knowledge-guided decoding. Bayesian optimization is employed to actively explore underrepresented regions of the latent space, while a position-aware surrogate model predicts binding affinities. Chemical constraints are integrated during decoding to ensure molecular validity and synthesizability. Evaluated on the CrossDocked2020 benchmark, the method significantly outperforms seven baseline approaches, achieving a favorable balance among high binding affinity, structural diversity, and controllable generation.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) possess strong representation and reasoning capabilities, but their application to structure-based drug design (SBDD) is limited by insufficient understanding of protein structures and unpredictable molecular generation. To address these challenges, we propose Exploration-Augmented Latent Inference for LLMs (ELILLM), a framework that reinterprets the LLM generation process as an encoding, latent space exploration, and decoding workflow. ELILLM explicitly explores portions of the design problem beyond the model's current knowledge while using a decoding module to handle familiar regions, generating chemically valid and synthetically reasonable molecules. In our implementation, Bayesian optimization guides the systematic exploration of latent embeddings, and a position-aware surrogate model efficiently predicts binding affinity distributions to inform the search. Knowledge-guided decoding further reduces randomness and effectively imposes chemical validity constraints. We demonstrate ELILLM on the CrossDocked2020 benchmark, showing strong controlled exploration and high binding affinity scores compared with seven baseline methods. These results demonstrate that ELILLM can effectively enhance LLMs capabilities for SBDD.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Structure-Based Drug Design
Protein Structure Understanding
Molecular Generation
Chemical Validity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Space Exploration
Bayesian Optimization
Structure-Based Drug Design
Knowledge-Guided Decoding
Binding Affinity Prediction
🔎 Similar Papers
No similar papers found.
X
Xuanning Hu
College of Computer Science and Technology, Jilin University; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
A
Anchen Li
College of Computer Science and Technology, Jilin University
Qianli Xing
Qianli Xing
Macquarie University
Data MiningDeep LearningCrowdsourcing
J
Jinglong Ji
College of Artificial Intelligence, Jilin University; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
H
Hao Tuo
College of Computer Science and Technology, Jilin University; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
B
Bo Yang
College of Computer Science and Technology, Jilin University; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University