🤖 AI Summary
This work addresses the misalignment between local reaction prediction and global synthetic objectives in retrosynthetic planning by introducing the first end-to-end chain-of-thought reasoning framework that directly embeds strategic foresight into chemical reasoning. The approach employs path-consistent molecular representations and a progressive training strategy, coupled with a smooth transition from reasoning distillation to verifiable reward-based reinforcement learning, ensuring that each step aligns with the utility of actual synthetic pathways. Evaluated on the RetroBench benchmark, the method achieves state-of-the-art performance, significantly outperforming existing hybrid approaches—particularly in long-horizon planning tasks—while demonstrating enhanced robustness and consistency.
📝 Abstract
Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To bridge this gap and embed sophisticated strategic foresight directly into the model's chemical reasoning, we introduce ReTriP, an end-to-end generative framework that reformulates retrosynthesis as a direct Chain-of-Thought reasoning task. We establish a path-coherent molecular representation and employ a progressive training curriculum that transitions from reasoning distillation to reinforcement learning with verifiable rewards, effectively aligning stepwise generation with practical route utility. Empirical evaluation on RetroBench demonstrates that ReTriP achieves state-of-the-art performance, exhibiting superior robustness in long-horizon planning compared to hybrid baselines.