Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address the challenge of structural elucidation for novel compounds absent from existing databases, this paper introduces the first end-to-end mass spectrometry–driven molecular structure generation method. It directly maps tandem mass spectra (MS/MS) and molecular formulas to complete molecular graph structures—bypassing conventional intermediate steps such as database searching, fragment annotation, or fingerprint encoding. The core innovation integrates a pre-trained Transformer architecture with test-time tuning (TTT), enabling dynamic model adaptation without human-labeled data. Evaluated on the NPLIB1 and MassSpecGym benchmarks, our method outperforms the state-of-the-art DiffMS by 100% and 20%, respectively. Moreover, TTT improves accuracy by 62% over standard fine-tuning, while maintaining chemically plausible predictions even for erroneous outputs—significantly enhancing robustness and generalizability for structurally novel compounds.

Technology Category

Application Category

📝 Abstract

Tandem Mass Spectrometry enables the identification of unknown compounds in crucial fields such as metabolomics, natural product discovery and environmental analysis. However, current methods rely on database matching from previously observed molecules, or on multi-step pipelines that require intermediate fragment or fingerprint prediction. This makes finding the correct molecule highly challenging, particularly for compounds absent from reference databases. We introduce a framework that, by leveraging test-time tuning, enhances the learning of a pre-trained transformer model to address this gap, enabling end-to-end de novo molecular structure generation directly from the tandem mass spectra and molecular formulae, bypassing manual annotations and intermediate steps. We surpass the de-facto state-of-the-art approach DiffMS on two popular benchmarks NPLIB1 and MassSpecGym by 100% and 20%, respectively. Test-time tuning on experimental spectra allows the model to dynamically adapt to novel spectra, and the relative performance gain over conventional fine-tuning is of 62% on MassSpecGym. When predictions deviate from the ground truth, the generated molecular candidates remain structurally accurate, providing valuable guidance for human interpretation and more reliable identification.

Problem

Research questions and friction points this paper is trying to address.

Generates molecular structures directly from MS/MS spectra

Overcomes database dependency for unknown compound identification

Eliminates intermediate steps in tandem mass spectrometry analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time tuning enhances pre-trained transformer models

Enables end-to-end molecular generation from spectra

Bypasses manual annotations and intermediate prediction steps

🔎 Similar Papers

Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity