Pushing the limits of one-dimensional NMR spectroscopy for automated structure elucidation using artificial intelligence

📅 2025-12-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated de novo structural elucidation of organic molecules (≤40 non-hydrogen atoms, including C/N/O/H/P/S/Si/B/halogens) solely from 1D $^1$H/$^{13}$C NMR spectra remains a longstanding challenge. Method: This work introduces, for the first time, a Transformer-based architecture for NMR-driven de novo structure generation, modeling molecular graphs as learnable sequences to enable end-to-end spectral-to-structural mapping. The approach overcomes combinatorial explosion by jointly learning spectral representations and graph generation, supports full elemental coverage, and enables fine-tuning on experimental data. Contribution/Results: Evaluated on a mainstream drug-like chemical space, the method achieves 55.2% top-1 accuracy within the top-15 predictions—substantially outperforming conventional approaches. It establishes a scalable, high-accuracy deep learning paradigm for NMR-based structural elucidation, enabling robust, data-efficient, and element-agnostic molecular inference.

Technology Category

Application Category

📝 Abstract
One-dimensional NMR spectroscopy is one of the most widely used techniques for the characterization of organic compounds and natural products. For molecules with up to 36 non-hydrogen atoms, the number of possible structures has been estimated to range from $10^{20} - 10^{60}$. The task of determining the structure (formula and connectivity) of a molecule of this size using only its one-dimensional $^1$H and/or $^{13}$C NMR spectrum, i.e. de novo structure generation, thus appears completely intractable. Here we show how it is possible to achieve this task for systems with up to 40 non-hydrogen atoms across the full elemental coverage typically encountered in organic chemistry (C, N, O, H, P, S, Si, B, and the halogens) using a deep learning framework, thus covering a vast portion of the drug-like chemical space. Leveraging insights from natural language processing, we show that our transformer-based architecture predicts the correct molecule with 55.2% accuracy within the first 15 predictions using only the $^1$H and $^{13}$C NMR spectra, thus overcoming the combinatorial growth of the chemical space while also being extensible to experimental data via fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Automates de novo molecular structure elucidation from 1D NMR spectra.
Overcomes combinatorial explosion in chemical space for up to 40 heavy atoms.
Predicts correct molecular structures using deep learning and transformer models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning framework automates de novo structure generation
Transformer architecture predicts molecules from NMR spectra
NLP-inspired approach overcomes combinatorial chemical space growth
🔎 Similar Papers
No similar papers found.
F
Frank Hu
Department of Chemistry, Stanford University, Stanford, California 94305, United States
J
Jonathan M. Tubb
Department of Chemistry, Stanford University, Stanford, California 94305, United States
D
Dimitris Argyropoulos
ACD/Labs, Toronto, Ontario M5C 1B5, Canada
S
Sergey Golotvin
ACD/Labs, Toronto, Ontario M5C 1B5, Canada
M
Mikhail Elyashberg
ACD/Labs, Toronto, Ontario M5C 1B5, Canada
Grant M. Rotskoff
Grant M. Rotskoff
Department of Chemistry, Stanford University
Nonequilibrium Statistical MechanicsSelf-AssemblyBiophysicsMachine Learning
M
Matthew W. Kanan
Department of Chemistry, Stanford University, Stanford, California 94305, United States
T
Thomas E. Markland
Department of Chemistry, Stanford University, Stanford, California 94305, United States