🤖 AI Summary
Existing methods for generating 3D conformations of large molecules at quantum-chemical accuracy suffer from high computational cost, strong reliance on large-scale conformational data, and poor generalizability. This paper introduces LEGO-Conf, a zero-shot conformation generation framework based on chemically meaningful fragment decomposition and diffusion modeling. Its core innovation lies in decomposing molecules into chemically valid substructures and assembling them via a physics-informed mechanism, enabling cross-scale (small-to-large molecule) conformation generation without training on large-molecule conformation datasets. LEGO-Conf integrates fragment-level diffusion generation, geometrically constrained assembly, and DFT-based validation—achieving high chemical validity and conformational diversity while significantly improving efficiency and accuracy. Experiments demonstrate state-of-the-art performance on benchmarks including QM9 and GEOM-DRUGS. Moreover, DFT-optimized structures generated by LEGO-Conf exhibit energy and geometric errors within chemically acceptable thresholds.
📝 Abstract
Obtaining 3D conformations of realistic polyatomic molecules at the quantum chemistry level remains challenging, and although recent machine learning advances offer promise, predicting large-molecule structures still requires substantial computational effort. Here, we introduce StoL, a diffusion model-based framework that enables rapid and knowledge-free generation of large molecular structures from small-molecule data. Remarkably, StoL assembles molecules in a LEGO-style fashion from scratch, without seeing the target molecules or any structures of comparable size during training. Given a SMILES input, it decomposes the molecule into chemically valid fragments, generates their 3D structures with a diffusion model trained on small molecules, and assembles them into diverse conformations. This fragment-based strategy eliminates the need for large-molecule training data while maintaining high scalability and transferability. By embedding chemical principles into key steps, StoL ensures faster convergence, chemically rational structures, and broad configurational coverage, as confirmed against DFT calculations.