🤖 AI Summary
This study addresses the end-to-end generation of molecular structures from multimodal infrared (IR) and dual-nucleus (¹H/¹³C) NMR spectra—a long-standing challenge in analytical chemistry. We propose a two-stage conditional generative framework: first, a count-aware molecular fragment encoder enables accurate structural reconstruction; second, a joint spectral encoder maps IR and both NMR modalities into a unified conditional embedding to guide generation. Our approach introduces, for the first time, count-aware fragment representations and a multimodal spectral joint conditioning mechanism, enabling fully automated structure elucidation without expert-defined rules or candidate structure libraries. Built upon a synergistic architecture integrating deep generative models, conditional variational autoencoders, and pretrained generators, our method achieves a 12.6% absolute improvement in top-1 accuracy on standard elucidation benchmarks and demonstrates strong robustness for highly complex molecules.
📝 Abstract
Molecular structure elucidation from spectroscopic data is a long-standing challenge in Chemistry, traditionally requiring expert interpretation. We introduce NMIRacle, a two-stage generative framework that builds upon recent paradigms in AI-driven spectroscopy with minimal assumptions. In the first stage, NMIRacle learns to reconstruct molecular structures from count-aware fragment encodings, which capture both fragment identities and their occurrences. In the second stage, a spectral encoder maps input spectroscopic measurements (IR, 1H-NMR, 13C-NMR) into a latent embedding that conditions the pre-trained generator. This formulation bridges fragment-level chemical modeling with spectral evidence, yielding accurate molecular predictions. Empirical results show that NMIRacle outperforms existing baselines on molecular elucidation, while maintaining robust performance across increasing levels of molecular complexity.