🤖 AI Summary
This study addresses the critical challenge of efficiently adapting general-purpose large language models (LLMs) to molecular science. We propose a unified multi-task curriculum learning framework, leveraging a high-quality, self-constructed molecular instruction dataset to post-train general reasoning models—jointly optimizing molecular structure understanding, property prediction, and generation. Notably, we are the first to successfully extend this paradigm to end-to-end retrosynthetic planning, achieving performance on RetroBench comparable to domain-specific methods. Our model establishes new state-of-the-art results across three major benchmarks—LlaSMol, TOMG-Bench, and MuMOInstruct—demonstrating strong cross-task knowledge fusion and generalization. Key contributions include: (1) introducing the first instruction-tuning paradigm specifically designed for molecular science; (2) empirically validating the efficacy of multi-task curriculum learning for chemical LLM adaptation; and (3) significantly expanding its applicability to complex molecular reasoning tasks, notably retrosynthesis.
📝 Abstract
Molecules play a crucial role in biomedical research and discovery, particularly in the field of small molecule drug development. Given the rapid advancements in large language models, especially the recent emergence of reasoning models, it is natural to explore how a general-purpose language model can be efficiently adapted for molecular science applications. In this work, we introduce BioMedGPT-Mol, a molecular language model designed to support molecular understanding and generation tasks. By curating and unifying existing public instruction datasets, we have assembled a large-scale, comprehensive, and high-quality training dataset. The model is then fine-tuned through a meticulously designed multi-task learning framework. On a consolidated benchmark derived from LlaSMol, TOMG-Bench, and MuMOInstruct, BioMedGPT-Mol achieves remarkable performance. Our experimental results demonstrate that a general-purpose reasoning model can be effectively and efficiently post-trained into a professional molecular language model through a well-structured multi-task curriculum. Leveraging the power of it, we further explore retrosynthetic planning task, and the performance on RetroBench demonstrates its competitive capability of acting as an end-to-end retrosynthetic planner. We anticipate that our approach can be extended to other biomedical scientific domains.