Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Drug discovery faces challenges including low efficiency, poor success rates, and difficulty in multi-objective optimization. This paper introduces Trio, a novel language model–driven fragment-based molecular generation framework. Trio integrates context-aware fragment assembly, policy-gradient reinforcement learning, and Monte Carlo Tree Search (MCTS) to jointly optimize multiple properties—binding affinity, drug-likeness, and synthetic accessibility—under protein pocket constraints. Its core innovation lies in modeling molecular design as an interpretable, strategy-guided search process, overcoming the limitations of single-objective optimization. Experiments demonstrate that all generated molecules are 100% chemically valid. Compared to state-of-the-art methods, Trio achieves improvements of 7.85% in binding affinity, 11.10% in drug-likeness, and 12.05% in synthetic accessibility, while expanding molecular diversity by over fourfold.

Technology Category

Application Category

📝 Abstract
Drug discovery is a time-consuming and expensive process, with traditional high-throughput and docking-based virtual screening hampered by low success rates and limited scalability. Recent advances in generative modelling, including autoregressive, diffusion, and flow-based approaches, have enabled de novo ligand design beyond the limits of enumerative screening. Yet these models often suffer from inadequate generalization, limited interpretability, and an overemphasis on binding affinity at the expense of key pharmacological properties, thereby restricting their translational utility. Here we present Trio, a molecular generation framework integrating fragment-based molecular language modeling, reinforcement learning, and Monte Carlo tree search, for effective and interpretable closed-loop targeted molecular design. Through the three key components, Trio enables context-aware fragment assembly, enforces physicochemical and synthetic feasibility, and guides a balanced search between the exploration of novel chemotypes and the exploitation of promising intermediates within protein binding pockets. Experimental results show that Trio reliably achieves chemically valid and pharmacologically enhanced ligands, outperforming state-of-the-art approaches with improved binding affinity (+7.85%), drug-likeness (+11.10%) and synthetic accessibility (+12.05%), while expanding molecular diversity more than fourfold.
Problem

Research questions and friction points this paper is trying to address.

Overcoming low success rates in drug discovery via generative models
Addressing poor generalization and interpretability in molecular design
Balancing binding affinity with pharmacological properties for better ligands
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates fragment-based language modeling, reinforcement learning, and Monte Carlo search
Enables context-aware fragment assembly with physicochemical feasibility
Guides balanced exploration and exploitation in protein binding pockets
🔎 Similar Papers
No similar papers found.
J
Junkai Ji
School of Artificial Intelligence, Shenzhen University Shenzhen, China
Z
Zhangfan Yang
School of Artificial Intelligence, Shenzhen University Shenzhen, China; School of Computer Science, University of Nottingham Ningbo, Ningbo, China
D
Dong Xu
School of Artificial Intelligence, Shenzhen University Shenzhen, China
R
Ruibin Bai
School of Computer Science, University of Nottingham Ningbo, Ningbo, China
J
Jianqiang Li
School of Artificial Intelligence, Shenzhen University Shenzhen, China
Tingjun Hou
Tingjun Hou
Qiushi Professor of Pharmaceutical Science, Zhejiang University
Computer-aided drug designComputational biologyCheminformaticsBioinformatics
Zexuan Zhu
Zexuan Zhu
Shenzhen University
Evolutionary ComputationMemetic ComputingBioinformaticsMachine Learning