Boosting LLM's Molecular Structure Elucidation with Knowledge Enhanced Tree Search Reasoning

📅 2025-06-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from insufficient domain-specific chemical knowledge, leading to poor accuracy in spectral-to-structure mapping for molecular structure elucidation. To address this, we propose K-MSE, a knowledge-augmented reasoning framework comprising three core components: (1) a curated molecular substructure knowledge base enabling external knowledge retrieval; (2) a differentiable molecular–spectrum matching scoring model that serves as a reward signal for structural hypothesis evaluation; and (3) Monte Carlo Tree Search (MCTS) for test-time dynamic knowledge expansion and efficient exploration of the structural hypothesis space. Evaluated on GPT-4o-mini and GPT-4o, K-MSE achieves over 20% absolute improvement in elucidation accuracy compared to baseline LLM-based approaches, demonstrating substantial gains in robustness and generalization. The framework significantly outperforms prior methods without requiring model retraining or fine-tuning. Our implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Molecular structure elucidation involves deducing a molecule's structure from various types of spectral data, which is crucial in chemical experimental analysis. While large language models (LLMs) have shown remarkable proficiency in analyzing and reasoning through complex tasks, they still encounter substantial challenges in molecular structure elucidation. We identify that these challenges largely stem from LLMs' limited grasp of specialized chemical knowledge. In this work, we introduce a Knowledge-enhanced reasoning framework for Molecular Structure Elucidation (K-MSE), leveraging Monte Carlo Tree Search for test-time scaling as a plugin. Specifically, we construct an external molecular substructure knowledge base to extend the LLMs' coverage of the chemical structure space. Furthermore, we design a specialized molecule-spectrum scorer to act as a reward model for the reasoning process, addressing the issue of inaccurate solution evaluation in LLMs. Experimental results show that our approach significantly boosts performance, particularly gaining more than 20% improvement on both GPT-4o-mini and GPT-4o. Our code is available at https://github.com/HICAI-ZJU/K-MSE.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' molecular structure deduction from spectral data
Addressing LLMs' limited chemical knowledge in structure elucidation
Improving accuracy in molecular solution evaluation via knowledge integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-enhanced reasoning framework for molecular elucidation
Monte Carlo Tree Search for test-time scaling
External molecular substructure knowledge base extension