Coder as Editor: Code-driven Interpretable Molecular Optimization

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Molecular optimization requires both accurate interpretation of semantic editing intents and precise structural modification; however, existing large language model (LLM)-based approaches operating directly on non-intuitive representations like SMILES often suffer from intent distortion. This paper proposes the first end-to-end molecular editing framework that employs executable Python code as an intermediate representation: natural language editing instructions are translated into chemically valid Python code, which is then deterministically executed to generate the target molecule. The method integrates LLMs, SMILES parsing, program synthesis, and chemical constraint verification into a unified “intent → code → structure” cascade. Evaluated across diverse optimization tasks, it achieves >90% edit consistency—surpassing SMILES-based baselines by 38–86 percentage points—while maintaining high structural similarity and optimization success rates. The approach significantly improves controllability, fidelity, and interpretability of molecular editing.

Technology Category

Application Category

📝 Abstract
Molecular optimization is a central task in drug discovery that requires precise structural reasoning and domain knowledge. While large language models (LLMs) have shown promise in generating high-level editing intentions in natural language, they often struggle to faithfully execute these modifications-particularly when operating on non-intuitive representations like SMILES. We introduce MECo, a framework that bridges reasoning and execution by translating editing actions into executable code. MECo reformulates molecular optimization for LLMs as a cascaded framework: generating human-interpretable editing intentions from a molecule and property goal, followed by translating those intentions into executable structural edits via code generation. Our approach achieves over 98% accuracy in reproducing held-out realistic edits derived from chemical reactions and target-specific compound pairs. On downstream optimization benchmarks spanning physicochemical properties and target activities, MECo substantially improves consistency by 38-86 percentage points to 90%+ and achieves higher success rates over SMILES-based baselines while preserving structural similarity. By aligning intention with execution, MECo enables consistent, controllable and interpretable molecular design, laying the foundation for high-fidelity feedback loops and collaborative human-AI workflows in drug discovery.
Problem

Research questions and friction points this paper is trying to address.

Bridging reasoning and execution in molecular optimization
Translating editing intentions into executable code
Improving consistency and success rates in drug discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Translates editing intentions into executable code
Uses cascaded framework for molecular optimization
Achieves high accuracy in reproducing realistic edits
🔎 Similar Papers
No similar papers found.
Wenyu Zhu
Wenyu Zhu
Institute for AI Industry Research, Tsinghua University
C
Chengzhu Li
Department of Computer Science and Technology, Tsinghua University
X
Xiaohe Tian
School of Pharmaceutical Sciences, Peking University
Y
Yifan Wang
Department of Computer Science and Technology, Tsinghua University
Y
Yinjun Jia
Institute for AI Industry Research, Tsinghua University
Jianhui Wang
Jianhui Wang
University of Electronic Science and Technology of China
Bowen Gao
Bowen Gao
Tsinghua University
AI4Science
Y
Ya-Qin Zhang
Institute for AI Industry Research, Tsinghua University
Wei-Ying Ma
Wei-Ying Ma
Tsinghua University
Generative AI and Large Language Models (LLMs) for Science
Yanyan Lan
Yanyan Lan
Tsinghua University
Information RetrievalMachine LearningAI4Science