When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current single-step retrosynthesis evaluations rely on a single ground-truth answer and Top-K accuracy, which fail to capture the openness and diversity inherent in real synthetic pathways. To address this limitation, this work proposes a novel evaluation framework tailored for large language models, shifting the focus from exact match paradigms to chemical plausibility as the core assessment criterion. The framework introduces ChemCensor, an automated evaluation tool, and leverages the CREED dataset—comprising millions of experimentally validated reactions—to establish robust training and benchmarking protocols. Experimental results demonstrate that models trained under this framework achieve significantly higher chemical plausibility compared to existing baselines, thereby validating the framework’s effectiveness and practical utility in advancing retrosynthetic prediction.

Technology Category

Application Category

📝 Abstract
Recent progress has expanded the use of large language models (LLMs) in drug discovery, including synthesis planning. However, objective evaluation of retrosynthesis performance remains limited. Existing benchmarks and metrics typically rely on published synthetic procedures and Top-K accuracy based on single ground-truth, which does not capture the open-ended nature of real-world synthesis planning. We propose a new benchmarking framework for single-step retrosynthesis that evaluates both general-purpose and chemistry-specialized LLMs using ChemCensor, a novel metric for chemical plausibility. By emphasizing plausibility over exact match, this approach better aligns with human synthesis planning practices. We also introduce CREED, a novel dataset comprising millions of ChemCensor-validated reaction records for LLM training, and use it to train a model that improves over the LLM baselines under this benchmark.
Problem

Research questions and friction points this paper is trying to address.

retrosynthesis
benchmarking
large language models
synthesis planning
evaluation metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

ChemCensor
CREED
retrosynthesis benchmarking
chemical plausibility
large language models
🔎 Similar Papers
No similar papers found.
B
B. Zagribelnyy
Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
I
Ivan Ilin
Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
Maksim Kuznetsov
Maksim Kuznetsov
Senior Research Scientist, Insilico Medicine
deep learninggenerative modelling
N
Nikita Bondarev
Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
Roman Schutski
Roman Schutski
Insilico Medicine
Quantum computingmachine learningtensor factorizationscomputational physics
T
Thomas MacDougall
Insilico Medicine Canada Inc., 3710-1250 René-Lévesque Blvd W, Montreal, Quebec, H3B 4W8, Canada
S
Shayakhmetov Rim
Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
Z
Zulfat Miftakhutdinov
Insilico Medicine Canada Inc., 3710-1250 René-Lévesque Blvd W, Montreal, Quebec, H3B 4W8, Canada
M
M. Mizera
Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong, Hong Kong SAR, China
V
Vladimir Aladinskiy
Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
A
Alexander M. Aliper
Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE
A
Alex Zhavoronkov
Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE; Insilico Medicine Canada Inc., 3710-1250 René-Lévesque Blvd W, Montreal, Quebec, H3B 4W8, Canada; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong, Hong Kong SAR, China