🤖 AI Summary
Molecular optimization requires both accurate interpretation of semantic editing intents and precise structural modification; however, existing large language model (LLM)-based approaches operating directly on non-intuitive representations like SMILES often suffer from intent distortion. This paper proposes the first end-to-end molecular editing framework that employs executable Python code as an intermediate representation: natural language editing instructions are translated into chemically valid Python code, which is then deterministically executed to generate the target molecule. The method integrates LLMs, SMILES parsing, program synthesis, and chemical constraint verification into a unified “intent → code → structure” cascade. Evaluated across diverse optimization tasks, it achieves >90% edit consistency—surpassing SMILES-based baselines by 38–86 percentage points—while maintaining high structural similarity and optimization success rates. The approach significantly improves controllability, fidelity, and interpretability of molecular editing.
📝 Abstract
Molecular optimization is a central task in drug discovery that requires precise structural reasoning and domain knowledge. While large language models (LLMs) have shown promise in generating high-level editing intentions in natural language, they often struggle to faithfully execute these modifications-particularly when operating on non-intuitive representations like SMILES. We introduce MECo, a framework that bridges reasoning and execution by translating editing actions into executable code. MECo reformulates molecular optimization for LLMs as a cascaded framework: generating human-interpretable editing intentions from a molecule and property goal, followed by translating those intentions into executable structural edits via code generation. Our approach achieves over 98% accuracy in reproducing held-out realistic edits derived from chemical reactions and target-specific compound pairs. On downstream optimization benchmarks spanning physicochemical properties and target activities, MECo substantially improves consistency by 38-86 percentage points to 90%+ and achieves higher success rates over SMILES-based baselines while preserving structural similarity. By aligning intention with execution, MECo enables consistent, controllable and interpretable molecular design, laying the foundation for high-fidelity feedback loops and collaborative human-AI workflows in drug discovery.