Chemistry-Enhanced Diffusion-Based Framework for Small-to-Large Molecular Conformation Generation

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for generating 3D conformations of large molecules at quantum-chemical accuracy suffer from high computational cost, strong reliance on large-scale conformational data, and poor generalizability. This paper introduces LEGO-Conf, a zero-shot conformation generation framework based on chemically meaningful fragment decomposition and diffusion modeling. Its core innovation lies in decomposing molecules into chemically valid substructures and assembling them via a physics-informed mechanism, enabling cross-scale (small-to-large molecule) conformation generation without training on large-molecule conformation datasets. LEGO-Conf integrates fragment-level diffusion generation, geometrically constrained assembly, and DFT-based validation—achieving high chemical validity and conformational diversity while significantly improving efficiency and accuracy. Experiments demonstrate state-of-the-art performance on benchmarks including QM9 and GEOM-DRUGS. Moreover, DFT-optimized structures generated by LEGO-Conf exhibit energy and geometric errors within chemically acceptable thresholds.

Technology Category

Application Category

📝 Abstract
Obtaining 3D conformations of realistic polyatomic molecules at the quantum chemistry level remains challenging, and although recent machine learning advances offer promise, predicting large-molecule structures still requires substantial computational effort. Here, we introduce StoL, a diffusion model-based framework that enables rapid and knowledge-free generation of large molecular structures from small-molecule data. Remarkably, StoL assembles molecules in a LEGO-style fashion from scratch, without seeing the target molecules or any structures of comparable size during training. Given a SMILES input, it decomposes the molecule into chemically valid fragments, generates their 3D structures with a diffusion model trained on small molecules, and assembles them into diverse conformations. This fragment-based strategy eliminates the need for large-molecule training data while maintaining high scalability and transferability. By embedding chemical principles into key steps, StoL ensures faster convergence, chemically rational structures, and broad configurational coverage, as confirmed against DFT calculations.
Problem

Research questions and friction points this paper is trying to address.

Generating quantum-accurate 3D conformations for large polyatomic molecules efficiently
Overcoming computational challenges in predicting large-molecule structures without training data
Assembling chemically valid molecular fragments into diverse conformations from SMILES
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chemistry-enhanced diffusion model for molecular generation
LEGO-style fragment assembly without large-molecule training
SMILES decomposition into chemically valid 3D fragments
🔎 Similar Papers
No similar papers found.
Yifei Zhu
Yifei Zhu
Shanghai Jiao Tong University
Edge computingmultimedia networkingdistributed ML systems
J
Jiahui Zhang
SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou 510006, P. R. China.
Jiawei Peng
Jiawei Peng
Southeast University
Multimodal
M
Mengge Li
MOE Key Laboratory of Environmental Theoretical Chemistry, South China Normal University, Guangzhou 510006, P. R. China.
C
Chao Xu
SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou 510006, P. R. China.
Z
Zhenggang Lan
SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou 510006, P. R. China.