JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data

📅 2024-11-18

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Low annotation accuracy of molecular structures from mass spectra remains a critical bottleneck in untargeted metabolomics. This paper proposes an end-to-end annotation method based on joint embedding: molecular structures and their corresponding mass spectra are mapped into a shared semantic space, enabling cross-modal retrieval via cosine similarity—bypassing conventional fingerprint prediction or spectrum generation paradigms. Crucially, we introduce a candidate molecule-aware regularization strategy during training, which significantly enhances discriminability between target molecules and structural analogs. Evaluated on three benchmark datasets, our method achieves average improvements of 23.6%–71.6% in rank@1–5, with a 11.4% gain in rank@1. This work establishes a new paradigm for high-accuracy, scalable metabolite annotation in mass spectrometry-based metabolomics.

Technology Category

Application Category

📝 Abstract

Motivation: A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low. Results: We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that explicitly construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1–5], JESTR outperforms other tools by 23.6% - 71.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 11.4% and enhancing the model’s ability to discern between target and candidate molecules. Through JESTR, we offer a novel promising avenue towards accurate annotation, therefore unlocking valuable insights into the metabolome. Availability: Code and dataset available at https://github.com/HassounLab/JESTR1/

Problem

Research questions and friction points this paper is trying to address.

Improving annotation accuracy for untargeted metabolomics data

Ranking candidate molecules using joint embedding space

Outperforming existing tools in molecular structure assignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embeds molecules and spectra in joint space

Ranks candidates by cosine similarity

Uses regularization with candidate molecules

🔎 Similar Papers

No similar papers found.