Pushing the limits of one-dimensional NMR spectroscopy for automated structure elucidation using artificial intelligence

📅 2025-12-20

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Automated de novo structural elucidation of organic molecules (≤40 non-hydrogen atoms, including C/N/O/H/P/S/Si/B/halogens) solely from 1D $^1$H/$^{13}$C NMR spectra remains a longstanding challenge. Method: This work introduces, for the first time, a Transformer-based architecture for NMR-driven de novo structure generation, modeling molecular graphs as learnable sequences to enable end-to-end spectral-to-structural mapping. The approach overcomes combinatorial explosion by jointly learning spectral representations and graph generation, supports full elemental coverage, and enables fine-tuning on experimental data. Contribution/Results: Evaluated on a mainstream drug-like chemical space, the method achieves 55.2% top-1 accuracy within the top-15 predictions—substantially outperforming conventional approaches. It establishes a scalable, high-accuracy deep learning paradigm for NMR-based structural elucidation, enabling robust, data-efficient, and element-agnostic molecular inference.

Technology Category

Application Category

📝 Abstract

One-dimensional NMR spectroscopy is one of the most widely used techniques for the characterization of organic compounds and natural products. For molecules with up to 36 non-hydrogen atoms, the number of possible structures has been estimated to range from $10^{20} - 10^{60}$. The task of determining the structure (formula and connectivity) of a molecule of this size using only its one-dimensional $^1$H and/or $^{13}$C NMR spectrum, i.e. de novo structure generation, thus appears completely intractable. Here we show how it is possible to achieve this task for systems with up to 40 non-hydrogen atoms across the full elemental coverage typically encountered in organic chemistry (C, N, O, H, P, S, Si, B, and the halogens) using a deep learning framework, thus covering a vast portion of the drug-like chemical space. Leveraging insights from natural language processing, we show that our transformer-based architecture predicts the correct molecule with 55.2% accuracy within the first 15 predictions using only the $^1$H and $^{13}$C NMR spectra, thus overcoming the combinatorial growth of the chemical space while also being extensible to experimental data via fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Automates de novo molecular structure elucidation from 1D NMR spectra.

Overcomes combinatorial explosion in chemical space for up to 40 heavy atoms.

Predicts correct molecular structures using deep learning and transformer models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning framework automates de novo structure generation

Transformer architecture predicts molecules from NMR spectra

NLP-inspired approach overcomes combinatorial chemical space growth

🔎 Similar Papers

No similar papers found.