🤖 AI Summary
Traditional retrosynthesis prediction methods lack explicit bond-disconnection reasoning, relying heavily on expert intuition and suffering from low efficiency. To address this limitation, this work proposes RetroReasoner—a large language model for retrosynthesis that integrates chemists’ strategic thinking by incorporating a structured bond-disconnection reasoning mechanism into molecular language modeling for the first time. The model is trained via a combination of supervised fine-tuning and reinforcement learning, explicitly generating both disconnection strategies and corresponding reactants, with round-trip synthesis accuracy serving as the reinforcement reward to ensure chemical feasibility. Experimental results demonstrate that RetroReasoner outperforms existing approaches across multiple benchmarks, particularly excelling in complex reactions by producing more diverse and chemically plausible synthetic routes.
📝 Abstract
Retrosynthesis prediction is a core task in organic synthesis that aims to predict reactants for a given product molecule. Traditionally, chemists select a plausible bond disconnection and derive corresponding reactants, which is time-consuming and requires substantial expertise. While recent advancements in molecular large language models (LLMs) have made progress, many methods either predict reactants without strategic reasoning or conduct only a generic product analysis, rather than reason explicitly about bond-disconnection strategies that logically lead to the choice of specific reactants. To overcome these limitations, we propose RetroReasoner, a retrosynthetic reasoning model that leverages chemists' strategic thinking. RetroReasoner is trained using both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we introduce SyntheticRetro, a framework that generates structured disconnection rationales alongside reactant predictions. In the case of RL, we apply a round-trip accuracy as reward, where predicted reactants are passed through a forward synthesis model, and predictions are rewarded when the forward-predicted product matches the original input product. Experimental results show that RetroReasoner not only outperforms prior baselines but also generates a broader range of feasible reactant proposals, particularly in handling more challenging reaction instances.