Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the tendency of large language models (LLMs) to generate hallucinated or outdated suggestions in chemical synthesis planning due to insufficient utilization of reaction knowledge graphs. The authors formulate reaction pathway retrieval as a natural language–to–Cypher query generation task and propose a one-shot prompting strategy grounded in aligned examples, augmented with a checklist-based self-correction mechanism. This approach integrates embedding-driven example selection within a verify-and-revise loop framework to enhance both the accuracy and executability of LLM-generated queries over knowledge graphs. Experimental results demonstrate that the proposed one-shot prompting significantly outperforms baseline methods, while the self-correction mechanism effectively improves query executability in zero-shot settings, though its benefits diminish when high-quality examples are available.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) can aid synthesis planning in chemistry, but standard prompting methods often yield hallucinated or outdated suggestions. We study LLM interactions with a reaction knowledge graph by casting reaction path retrieval as a Text2Cypher (natural language to graph query) generation problem, and define single- and multi-step retrieval tasks. We compare zero-shot prompting to one-shot variants using static, random, and embedding-based exemplar selection, and assess a checklist-driven validator/corrector loop. To evaluate our framework, we consider query validity and retrieval accuracy. We find that one-shot prompting with aligned exemplars consistently performs best. Our checklist-style self-correction loop mainly improves executability in zero-shot settings and offers limited additional retrieval gains once a good exemplar is present. We provide a reproducible Text2Cypher evaluation setup to facilitate further work on KG-grounded LLMs for synthesis planning. Code is available at https://github.com/Intelligent-molecular-systems/KG-LLM-Synthesis-Retrieval.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Reaction Knowledge Graphs
Synthesis Retrieval
Text2Cypher
Hallucination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text2Cypher
knowledge graph grounding
one-shot prompting
self-correction loop
synthesis planning
O
Olga Bunkova
Department of Intelligent Systems, Delft University of Technology, The Netherlands
L
Lorenzo Di Fruscia
Department of Intelligent Systems, Delft University of Technology, The Netherlands
Sophia Rupprecht
Sophia Rupprecht
Delft University of Technology, Department of Chemical Engineering
Artur M. Schweidtmann
Artur M. Schweidtmann
Delft University of Technology, Department of Chemical Engineering, Process Intelligence Research
Process Systems EngineeringArtificial IntelligenceMachine LearningOptimizationGlobal Optimization
M
Marcel J. T. Reinders
Department of Intelligent Systems, Delft University of Technology, The Netherlands
J
Jana M. Weber
Department of Intelligent Systems, Delft University of Technology, The Netherlands