Bridging Chemists and AI: An Expert-Augmented Framework for Interpretable Route Evaluation

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing data-driven approaches in multi-step organic synthesis route evaluation, which often oversimplify multi-objective optimization and rely on non-generalizable proxy data, thereby failing to balance feasibility, cost, and efficiency. The authors propose the first interpretable scoring framework that explicitly integrates domain knowledge from chemists. Built upon a DeepSets architecture and incorporating tree edit distance as a structural similarity measure, the model jointly learns regression (for quantitative scores) and multi-class classification (categorizing routes as Good, Plausible, or Bad), refined through expert feedback. The method achieves a Spearman correlation of 0.78 and Pearson correlation of 0.77 in score prediction, along with a Top-1 ranking accuracy of 60.2%—substantially outperforming baseline models by 17.5%—demonstrating both high predictive correlation and interpretability in multi-dimensional synthesis route assessment.
📝 Abstract
Selecting efficient multi-step synthetic routes is a central challenge in organic synthesis, particularly in medicinal and process chemistry, where route choice directly impacts feasibility, cost, and development efficiency. Data-driven assessment systems often oversimplify the multi-objective nature of synthesis design and rely on proxy datasets, such as patent routes, rather than universally grounded criteria. To address this, we introduce an expert-augmented, data-driven scoring framework that integrates machine learning with chemists' domain knowledge for both numerical and explainable route assessment. A DeepSets-based model is trained using tree edit distance between reference and machine-generated routes, and then fine-tuned with expert evaluations to produce both quantitative scores and interpretable qualitative categories: Good, Plausible, and Bad. The resulting system achieves a Spearman correlation coefficient of 0.78 and a Pearson correlation of 0.77 for category assessment prediction, and 60.2% top-1 ranking accuracy for score prediction, substantially outperforming the previous baseline of 17.5%.
Problem

Research questions and friction points this paper is trying to address.

synthetic route selection
organic synthesis
multi-objective optimization
interpretable evaluation
expert knowledge integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

expert-augmented learning
interpretable route evaluation
DeepSets
tree edit distance
synthetic route scoring
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
Y
Yujia Guo
Department of Computer Science, Aalto University, Espoo, Finland
Mikhail Kabeshov
Mikhail Kabeshov
AstraZeneca
AI in drug discoveryComputational ChemistryMachine LearningOrganic Chemistry
T
Tat Hong Duong Le
Department of Computer Science, Aalto University, Espoo, Finland
Samuel Genheden
Samuel Genheden
R&D AstraZeneca, Gothenburg
Software developmentdrug designcheminformaticscomputational biochemistry
M
Marco V. Mijangos
Discovery Sciences R&D, AstraZeneca, Gothenburg, Sweden
V
Varvara Voinarvoska
Discovery Sciences R&D, AstraZeneca, Gothenburg, Sweden; Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden
G
Giulia Bergonzini
Discovery Sciences R&D, AstraZeneca, Gothenburg, Sweden
Ola Engkvist
Ola Engkvist
AstraZeneca R&D Gothenburg Orcid:0000-0003-4970-6461
CheminformaticsDrug DiscoveryMachine LearningSemantic Web TechnologiesOpen Innovation
Samuel Kaski
Samuel Kaski
Director, ELLIS Institute Finland; Professor, Aalto University and University of Manchester
Probabilistic machine learningAI4ScienceCollaborative AI