Large Language Models for Controllable Multi-property Multi-objective Molecule Optimization

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Drug molecule optimization requires precise, multi-objective enhancement of molecular properties to meet pharmaceutical standards while preserving already-satisfied attributes—a fine-grained, property-specific task poorly addressed by existing computational methods and general instruction-tuned large language models (LLMs). Method: We introduce C-MuMOInstruct, the first instruction-tuning dataset tailored for multi-attribute selective molecular optimization, and propose the GeLLMO-Cs model family, integrating multi-attribute constraint modeling, controllable graph-to-SMILES molecular generation, and zero-shot transfer learning. Contribution/Results: GeLLMO-Cs achieves cross-task generalization without retraining, outperforming state-of-the-art baselines by up to 126% in success rate across ten in-distribution and out-of-distribution optimization tasks. This work establishes a new paradigm for AI-driven, precision drug design grounded in attribute-aware, instruction-guided molecular optimization.

Technology Category

Application Category

📝 Abstract

In real-world drug design, molecule optimization requires selectively improving multiple molecular properties up to pharmaceutically relevant levels, while maintaining others that already meet such criteria. However, existing computational approaches and instruction-tuned LLMs fail to capture such nuanced property-specific objectives, limiting their practical applicability. To address this, we introduce C-MuMOInstruct, the first instruction-tuning dataset focused on multi-property optimization with explicit, property-specific objectives. Leveraging C-MuMOInstruct, we develop GeLLMO-Cs, a series of instruction-tuned LLMs that can perform targeted property-specific optimization. Our experiments across 5 in-distribution and 5 out-of-distribution tasks show that GeLLMO-Cs consistently outperform strong baselines, achieving up to 126% higher success rate. Notably, GeLLMO-Cs exhibit impressive 0-shot generalization to novel optimization tasks and unseen instructions. This offers a step toward a foundational LLM to support realistic, diverse optimizations with property-specific objectives. C-MuMOInstruct and code are accessible through https://github.com/ninglab/GeLLMO-C.

Problem

Research questions and friction points this paper is trying to address.

Optimizing multiple molecular properties simultaneously in drug design

Addressing limitations of existing LLMs in property-specific objectives

Enhancing success rates and generalization in molecule optimization tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Instruction-tuning dataset for multi-property optimization

LLMs for targeted property-specific molecule optimization

0-shot generalization to novel optimization tasks

🔎 Similar Papers

Efficient Evolutionary Search Over Chemical Space with Large Language Models