FlowMS: Flow Matching for De Novo Structure Elucidation from Mass Spectra

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of de novo molecular structure elucidation, which is hindered by the vastness of chemical space and the ambiguity of mass spectrometry fragmentation patterns. The authors propose FlowMS, the first mass spectrometry–conditioned molecular generation framework based on discrete flow matching. FlowMS iteratively refines molecular graphs in a probabilistic space by integrating spectral embeddings from a pretrained molecular formula encoder with explicit formula constraints, enabling efficient and accurate structure generation. Notably, this study introduces discrete flow matching to mass spectrometry–driven de novo molecular design for the first time. Evaluated on the NPLIB1 benchmark, FlowMS achieves state-of-the-art performance in five out of six metrics, with a Top-1 accuracy of 9.15% and a Top-10 MCES score of 7.96, demonstrating that the generated structures closely approximate true molecular configurations.

Technology Category

Application Category

📝 Abstract
Mass spectrometry (MS) stands as a cornerstone analytical technique for molecular identification, yet de novo structure elucidation from spectra remains challenging due to the combinatorial complexity of chemical space and the inherent ambiguity of spectral fragmentation patterns. Recent deep learning approaches, including autoregressive sequence models, scaffold-based methods, and graph diffusion models, have made progress. However, diffusion-based generation for this task remains computationally demanding. Meanwhile, discrete flow matching, which has shown strong performance for graph generation, has not yet been explored for spectrum-conditioned structure elucidation. In this work, we introduce FlowMS, the first discrete flow matching framework for spectrum-conditioned de novo molecular generation. FlowMS generates molecular graphs through iterative refinement in probability space, enforcing chemical formula constraints while conditioning on spectral embeddings from a pretrained formula transformer encoder. Notably, it achieves state-of-the-art performance on 5 out of 6 metrics on the NPLIB1 benchmark: 9.15% top-1 accuracy (9.7% relative improvement over DiffMS) and 7.96 top-10 MCES (4.2% improvement over MS-BART). We also visualize the generated molecules, which further demonstrate that FlowMS produces structurally plausible candidates closely resembling ground truth structures. These results establish discrete flow matching as a promising paradigm for mass spectrometry-based structure elucidation in metabolomics and natural product discovery.
Problem

Research questions and friction points this paper is trying to address.

de novo structure elucidation
mass spectrometry
molecular identification
spectral fragmentation
chemical space
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete flow matching
de novo structure elucidation
mass spectrometry
molecular graph generation
spectrum-conditioned generation
🔎 Similar Papers
No similar papers found.