🤖 AI Summary
Existing compound-to-tandem mass spectrometry (C2MS) prediction models suffer from low mass accuracy, poor generalizability, limited interpretability, and scalability challenges. To address these issues, this paper proposes a deep probabilistic model integrating graph neural networks with variational inference. Our method introduces a structured latent space to explicitly model the MS/MS generation mechanism: it jointly encodes molecular graph structure, explicitly represents fragmentation pathways, and probabilistically estimates both peak intensities and their uncertainties. The resulting framework enables end-to-end, high-resolution, and interpretable spectrum generation. Experimentally, it achieves state-of-the-art prediction accuracy—measured by spectral error metrics—and significantly outperforms all baseline methods in retrieval-based compound identification. Notably, it markedly improves identification rates for low-abundance and uncharacterized compounds in complex mixtures, demonstrating superior robustness and practical utility in real-world metabolomics applications.
📝 Abstract
The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted spectra. Unfortunately, many existing C2MS models suffer from problems with prediction resolution, scalability, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately predict high-resolution spectra. FraGNNet uses a structured latent space to provide insight into the underlying processes that define the spectrum. Our model achieves state-of-the-art performance in terms of prediction error, and surpasses existing C2MS models as a tool for retrieval-based MS2C.