Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of structural awareness in retrieval-augmented generation (RAG) for molecular machine learning, this work introduces the first end-to-end, noise-robust neural graph matching (NGM) mechanism integrated into a molecular RAG framework for mass spectrum simulation. Methodologically, it unifies graph neural networks (GNNs), differentiable graph matching—modeling node- and edge-level affinities—with retrieval-augmented generation and molecular fragmentation modeling to enable fine-grained structural alignment and knowledge fusion between query and retrieved molecules. The proposed model, MARASON, achieves 28.0% top-1 accuracy on mass spectrum prediction, outperforming non-retrieval state-of-the-art by 9.0 percentage points and substantially surpassing both naive retrieval and conventional graph matching baselines. Its core contribution is the first structure-aware NGM paradigm tailored for molecular RAG, uniquely balancing robustness to structural perturbations and generative generalizability.

Technology Category

Application Category

📝 Abstract
Molecular machine learning has gained popularity with the advancements of geometric deep learning. In parallel, retrieval-augmented generation has become a principled approach commonly used with language models. However, the optimal integration of retrieval augmentation into molecular machine learning remains unclear. Graph neural networks stand to benefit from clever matching to understand the structural alignment of retrieved molecules to a query molecule. Neural graph matching offers a compelling solution by explicitly modeling node and edge affinities between two structural graphs while employing a noise-robust, end-to-end neural network to learn affinity metrics. We apply this approach to mass spectrum simulation and introduce MARASON, a novel model that incorporates neural graph matching to enhance a fragmentation-based neural network. Experimental results highlight the effectiveness of our design, with MARASON achieving 28% top-1 accuracy, a substantial improvement over the non-retrieval state-of-the-art accuracy of 19%. Moreover, MARASON outperforms both naive retrieval-augmented generation methods and traditional graph matching approaches.
Problem

Research questions and friction points this paper is trying to address.

Integration of retrieval augmentation in molecular machine learning
Neural graph matching for structural alignment understanding
Improvement in mass spectrum simulation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural graph matching enhances molecular retrieval
End-to-end neural network learns affinity metrics
MARASON model improves mass spectrum simulation accuracy
🔎 Similar Papers
No similar papers found.
Runzhong Wang
Runzhong Wang
Postdoc, MIT
combinatorial optimizationcomputational metabolomicsgraph matching
R
Rui-Xi Wang
Massachusetts Institute of Technology, Cambridge, MA, United States
M
Mrunali Manjrekar
Massachusetts Institute of Technology, Cambridge, MA, United States
Connor W. Coley
Connor W. Coley
Massachusetts Institute of Technology
machine learningdrug discoveryautomationsynthetic chemistry