🤖 AI Summary
This work addresses the limitations of large language models in structure-based drug design, which often stem from inadequate understanding of protein structures and insufficient control over molecular generation. The authors propose an exploration-augmented latent reasoning framework that decouples the generation process into three stages: encoding, latent space exploration, and knowledge-guided decoding. Bayesian optimization is employed to actively explore underrepresented regions of the latent space, while a position-aware surrogate model predicts binding affinities. Chemical constraints are integrated during decoding to ensure molecular validity and synthesizability. Evaluated on the CrossDocked2020 benchmark, the method significantly outperforms seven baseline approaches, achieving a favorable balance among high binding affinity, structural diversity, and controllable generation.
📝 Abstract
Large Language Models (LLMs) possess strong representation and reasoning capabilities, but their application to structure-based drug design (SBDD) is limited by insufficient understanding of protein structures and unpredictable molecular generation. To address these challenges, we propose Exploration-Augmented Latent Inference for LLMs (ELILLM), a framework that reinterprets the LLM generation process as an encoding, latent space exploration, and decoding workflow. ELILLM explicitly explores portions of the design problem beyond the model's current knowledge while using a decoding module to handle familiar regions, generating chemically valid and synthetically reasonable molecules. In our implementation, Bayesian optimization guides the systematic exploration of latent embeddings, and a position-aware surrogate model efficiently predicts binding affinity distributions to inform the search. Knowledge-guided decoding further reduces randomness and effectively imposes chemical validity constraints. We demonstrate ELILLM on the CrossDocked2020 benchmark, showing strong controlled exploration and high binding affinity scores compared with seven baseline methods. These results demonstrate that ELILLM can effectively enhance LLMs capabilities for SBDD.