🤖 AI Summary
This work addresses the challenge of accurately reconstructing complex isomeric structures in mass spectrometry–driven molecular generation, a task hindered by conventional methods that neglect higher-order many-body interactions. To overcome this limitation, we propose MBGen, a novel framework that, for the first time, explicitly models many-body interactions in this context, moving beyond traditional atom-centered and pairwise interaction paradigms. Built upon a diffusion model architecture, MBGen integrates a many-body attention mechanism, high-order graph neural networks, and mass spectral fragment fingerprint encoding to enable end-to-end de novo molecular generation with explicit isomer differentiation. Evaluated on the NPLIB1 and MassSpecGym benchmarks, MBGen outperforms state-of-the-art methods by up to 230%, demonstrating substantially enhanced capability in modeling non-local fragmentation mechanisms and complex spectral patterns.
📝 Abstract
Molecular structure generation from mass spectrometry is fundamental for understanding cellular metabolism and discovering novel compounds. Although tandem mass spectrometry (MS/MS) enables the high-throughput acquisition of fragment fingerprints, these spectra often reflect higher-order interactions involving the concerted cleavage of multiple atoms and bonds-crucial for resolving complex isomers and non-local fragmentation mechanisms. However, most existing methods adopt atom-centric and pairwise interaction modeling, overlooking higher-order edge interactions and lacking the capacity to systematically capture essential many-body characteristics for structure generation. To overcome these limitations, we present MBGen, a Many-Body enhanced diffusion framework for de novo molecular structure Generation from mass spectra. By integrating a many-body attention mechanism and higher-order edge modeling, MBGen comprehensively leverages the rich structural information encoded in MS/MS spectra, enabling accurate de novo generation and isomer differentiation for novel molecules. Experimental results on the NPLIB1 and MassSpecGym benchmarks demonstrate that MBGen achieves superior performance, with improvements of up to 230% over state-of-the-art methods, highlighting the scientific value and practical utility of many-body modeling for mass spectrometry-based molecular generation. Further analysis and ablation studies show that our approach effectively captures higher-order interactions and exhibits enhanced sensitivity to complex isomeric and non-local fragmentation information.