A Survey of Large Language Models for Text-Guided Molecular Discovery: from Molecule Generation to Optimization

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Molecular discovery remains constrained by traditional computational methods’ limited capacity to integrate heterogeneous data modalities and domain-specific constraints. Method: This work systematically investigates how large language models (LLMs) can transform molecular discovery, focusing on two core tasks—text- and symbol (SMILES/SELFIES)-conditioned molecular generation and multi-modal molecular optimization. We propose the first taxonomy of LLM-based molecular discovery tasks, unifying autoregressive generation, instruction tuning, reinforcement learning–based optimization, symbolic constraint decoding, and multi-modal alignment. An extensible evaluation framework is built upon standard benchmarks (e.g., ChEMBL, ZINC), complemented by a structured knowledge graph and an open-source resource repository supporting reproducibility and continuous updates. Contribution/Results: The study establishes a foundational theoretical paradigm, technical pipeline, and benchmark suite for LLM–computational chemistry integration, enabling rigorous, scalable, and interpretable molecular design.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are introducing a paradigm shift in molecular discovery by enabling text-guided interaction with chemical spaces through natural language, symbolic notations, with emerging extensions to incorporate multi-modal inputs. To advance the new field of LLM for molecular discovery, this survey provides an up-to-date and forward-looking review of the emerging use of LLMs for two central tasks: molecule generation and molecule optimization. Based on our proposed taxonomy for both problems, we analyze representative techniques in each category, highlighting how LLM capabilities are leveraged across different learning settings. In addition, we include the commonly used datasets and evaluation protocols. We conclude by discussing key challenges and future directions, positioning this survey as a resource for researchers working at the intersection of LLMs and molecular science. A continuously updated reading list is available at https://github.com/REAL-Lab-NU/Awesome-LLM-Centric-Molecular-Discovery.
Problem

Research questions and friction points this paper is trying to address.

Surveying LLMs for text-guided molecule generation and optimization
Analyzing LLM techniques for molecular discovery tasks
Discussing challenges in LLM applications for molecular science
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs enable text-guided molecular discovery
Survey reviews molecule generation and optimization
Includes datasets, evaluation protocols, future challenges
🔎 Similar Papers
No similar papers found.