🤖 AI Summary
This work addresses the challenge that large language models (LLMs) struggle to precisely control chemically sensitive sites during retrosynthetic planning, often yielding chemically invalid or suboptimal pathways. To overcome this limitation, the authors propose a neuro-symbolic hybrid framework that deeply integrates symbolic reasoning based on SMARTS rules with LLMs for the first time. The approach automatically identifies and protects reactive functional groups using over 55 reaction templates and more than 40 protecting group rules, while incorporating atom-mapping tracking and a human-in-the-loop mechanism to guide pathway generation. Applied to complex natural products such as erythromycin B, the method discovers novel, chemically feasible retrosynthetic routes, significantly enhancing both chemical validity and expert-level reliability of the proposed syntheses.
📝 Abstract
Large Language Models (LLMs) have shown remarkable potential in scientific domains like retrosynthesis; yet, they often lack the fine-grained control necessary to navigate complex problem spaces without error. A critical challenge is directing an LLM to avoid specific, chemically sensitive sites on a molecule - a task where unconstrained generation can lead to invalid or undesirable synthetic pathways. In this work, we introduce Protect$^*$, a neuro-symbolic framework that grounds the generative capabilities of Large Language Models (LLMs) in rigorous chemical logic. Our approach combines automated rule-based reasoning - using a comprehensive database of 55+ SMARTS patterns and 40+ characterized protecting groups - with the generative intuition of neural models. The system operates via a hybrid architecture: an ``automatic mode''where symbolic logic deterministically identifies and guards reactive sites, and a ``human-in-the-loop mode''that integrates expert strategic constraints. Through ``active state tracking,''we inject hard symbolic constraints into the neural inference process via a dedicated protection state linked to canonical atom maps. We demonstrate this neuro-symbolic approach through case studies on complex natural products, including the discovery of a novel synthetic pathway for Erythromycin B, showing that grounding neural generation in symbolic logic enables reliable, expert-level autonomy.