FraGNNet: A Deep Probabilistic Model for Mass Spectrum Prediction

📅 2024-04-02
🏛️ arXiv.org
📈 Citations: 5
Influential: 1
📄 PDF
🤖 AI Summary
Existing compound-to-tandem mass spectrometry (C2MS) prediction models suffer from low mass accuracy, poor generalizability, limited interpretability, and scalability challenges. To address these issues, this paper proposes a deep probabilistic model integrating graph neural networks with variational inference. Our method introduces a structured latent space to explicitly model the MS/MS generation mechanism: it jointly encodes molecular graph structure, explicitly represents fragmentation pathways, and probabilistically estimates both peak intensities and their uncertainties. The resulting framework enables end-to-end, high-resolution, and interpretable spectrum generation. Experimentally, it achieves state-of-the-art prediction accuracy—measured by spectral error metrics—and significantly outperforms all baseline methods in retrieval-based compound identification. Notably, it markedly improves identification rates for low-abundance and uncharacterized compounds in complex mixtures, demonstrating superior robustness and practical utility in real-world metabolomics applications.

Technology Category

Application Category

📝 Abstract
The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted spectra. Unfortunately, many existing C2MS models suffer from problems with prediction resolution, scalability, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately predict high-resolution spectra. FraGNNet uses a structured latent space to provide insight into the underlying processes that define the spectrum. Our model achieves state-of-the-art performance in terms of prediction error, and surpasses existing C2MS models as a tool for retrieval-based MS2C.
Problem

Research questions and friction points this paper is trying to address.

Predicting tandem mass spectra from molecular structures accurately
Overcoming limitations of incomplete spectral libraries in compound identification
Improving mass accuracy and generalization in spectrum prediction models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep probabilistic model for MS/MS prediction
Learns distribution over molecule fragments
Achieves high mass accuracy and generalization
🔎 Similar Papers
No similar papers found.
A
A. Young
Department of Computer Science, University of Toronto, Toronto, Canada; Vector Institute for Artificial Intelligence, Toronto, Canada; Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
F
Fei Wang
Department of Computing Science, University of Alberta, Edmonton, Canada; Alberta Machine Intelligence Institute, Edmonton, Canada
D
D. Wishart
Department of Computing Science, University of Alberta, Edmonton, Canada; Department of Biological Sciences, University of Alberta, Edmonton, Alberta; Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, Alberta; Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta
B
Bo Wang
Department of Computer Science, University of Toronto, Toronto, Canada; Vector Institute for Artificial Intelligence, Toronto, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada; Peter Munk Cardiac Centre, University Health Network, Toronto Canada
H
Hannes L. Röst
Department of Molecular Genetics, University of Toronto, Toronto, Canada
Russell Greiner
Russell Greiner
Professor of Computing Science, University of Alberta; CIFAR AI Chair
Artificial IntelligenceMachine LearningSurvival PredictionMedical InformaticsEvidence-based