De Novo Molecular Generation from Mass Spectra via Many-Body Enhanced Diffusion

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of accurately reconstructing complex isomeric structures in mass spectrometry–driven molecular generation, a task hindered by conventional methods that neglect higher-order many-body interactions. To overcome this limitation, we propose MBGen, a novel framework that, for the first time, explicitly models many-body interactions in this context, moving beyond traditional atom-centered and pairwise interaction paradigms. Built upon a diffusion model architecture, MBGen integrates a many-body attention mechanism, high-order graph neural networks, and mass spectral fragment fingerprint encoding to enable end-to-end de novo molecular generation with explicit isomer differentiation. Evaluated on the NPLIB1 and MassSpecGym benchmarks, MBGen outperforms state-of-the-art methods by up to 230%, demonstrating substantially enhanced capability in modeling non-local fragmentation mechanisms and complex spectral patterns.

Technology Category

Application Category

📝 Abstract
Molecular structure generation from mass spectrometry is fundamental for understanding cellular metabolism and discovering novel compounds. Although tandem mass spectrometry (MS/MS) enables the high-throughput acquisition of fragment fingerprints, these spectra often reflect higher-order interactions involving the concerted cleavage of multiple atoms and bonds-crucial for resolving complex isomers and non-local fragmentation mechanisms. However, most existing methods adopt atom-centric and pairwise interaction modeling, overlooking higher-order edge interactions and lacking the capacity to systematically capture essential many-body characteristics for structure generation. To overcome these limitations, we present MBGen, a Many-Body enhanced diffusion framework for de novo molecular structure Generation from mass spectra. By integrating a many-body attention mechanism and higher-order edge modeling, MBGen comprehensively leverages the rich structural information encoded in MS/MS spectra, enabling accurate de novo generation and isomer differentiation for novel molecules. Experimental results on the NPLIB1 and MassSpecGym benchmarks demonstrate that MBGen achieves superior performance, with improvements of up to 230% over state-of-the-art methods, highlighting the scientific value and practical utility of many-body modeling for mass spectrometry-based molecular generation. Further analysis and ablation studies show that our approach effectively captures higher-order interactions and exhibits enhanced sensitivity to complex isomeric and non-local fragmentation information.
Problem

Research questions and friction points this paper is trying to address.

de novo molecular generation
mass spectrometry
many-body interactions
isomer differentiation
higher-order fragmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

many-body interactions
diffusion model
de novo molecular generation
mass spectrometry
higher-order edge modeling
🔎 Similar Papers
No similar papers found.
Xichen Sun
Xichen Sun
Tilburg University
sustainable operationscircular economyservitization
W
Wentao Wei
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China
Jiahua Rao
Jiahua Rao
Sun Yat-sen University
AI4ScienceMulti-scale Learning
J
Jiancong Xie
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China
Yuedong Yang
Yuedong Yang
School of Computer Science and Engineering, Sun Yat-sen University
Structural BioinformaticsDigital Cell