π€ AI Summary
Drug molecule optimization requires precise, multi-objective enhancement of molecular properties to meet pharmaceutical standards while preserving already-satisfied attributesβa fine-grained, property-specific task poorly addressed by existing computational methods and general instruction-tuned large language models (LLMs).
Method: We introduce C-MuMOInstruct, the first instruction-tuning dataset tailored for multi-attribute selective molecular optimization, and propose the GeLLMO-Cs model family, integrating multi-attribute constraint modeling, controllable graph-to-SMILES molecular generation, and zero-shot transfer learning.
Contribution/Results: GeLLMO-Cs achieves cross-task generalization without retraining, outperforming state-of-the-art baselines by up to 126% in success rate across ten in-distribution and out-of-distribution optimization tasks. This work establishes a new paradigm for AI-driven, precision drug design grounded in attribute-aware, instruction-guided molecular optimization.
π Abstract
In real-world drug design, molecule optimization requires selectively improving multiple molecular properties up to pharmaceutically relevant levels, while maintaining others that already meet such criteria. However, existing computational approaches and instruction-tuned LLMs fail to capture such nuanced property-specific objectives, limiting their practical applicability. To address this, we introduce C-MuMOInstruct, the first instruction-tuning dataset focused on multi-property optimization with explicit, property-specific objectives. Leveraging C-MuMOInstruct, we develop GeLLMO-Cs, a series of instruction-tuned LLMs that can perform targeted property-specific optimization. Our experiments across 5 in-distribution and 5 out-of-distribution tasks show that GeLLMO-Cs consistently outperform strong baselines, achieving up to 126% higher success rate. Notably, GeLLMO-Cs exhibit impressive 0-shot generalization to novel optimization tasks and unseen instructions. This offers a step toward a foundational LLM to support realistic, diverse optimizations with property-specific objectives. C-MuMOInstruct and code are accessible through https://github.com/ninglab/GeLLMO-C.