🤖 AI Summary
This work addresses the heavy reliance on expert knowledge in small-molecule structural elucidation by proposing ChefNMR—the first end-to-end, NMR-spectrum-driven method for fully automated 3D structure generation. ChefNMR introduces a novel conditional generative framework that integrates an atomic diffusion model with a non-equivariant Transformer, enabling direct prediction of 3D conformations from only 1D ¹H/¹³C NMR spectra and molecular formula. To enable robust training, the authors curate a high-fidelity simulated NMR dataset comprising over 110,000 natural products. On a challenging natural-product test set, ChefNMR achieves >65% Top-1 structural accuracy—substantially outperforming existing approaches. The code and dataset are publicly released, establishing a scalable, reproducible paradigm for automated molecular structure determination.
📝 Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy is a cornerstone technique for determining the structures of small molecules and is especially critical in the discovery of novel natural products and clinical therapeutics. Yet, interpreting NMR spectra remains a time-consuming, manual process requiring extensive domain expertise. We introduce ChefNMR (CHemical Elucidation From NMR), an end-to-end framework that directly predicts an unknown molecule's structure solely from its 1D NMR spectra and chemical formula. We frame structure elucidation as conditional generation from an atomic diffusion model built on a non-equivariant transformer architecture. To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products. ChefNMR predicts the structures of challenging natural product compounds with an unsurpassed accuracy of over 65%. This work takes a significant step toward solving the grand challenge of automating small-molecule structure elucidation and highlights the potential of deep learning in accelerating molecular discovery. Code is available at https://github.com/ml-struct-bio/chefnmr.