OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Crystal structure prediction (CSP)—the task of accurately generating experimentally realizable 3D crystal structures from 2D molecular graphs—remains a longstanding, fundamental challenge in computational chemistry, with critical implications for drug discovery and organic semiconductor design. This work introduces S⁴ (Stoichiometric Stochastic Shell Sampling), a crystallization-inspired training paradigm that abandons conventional equivariant architectures and explicit lattice parameterization. Instead, it employs a full-atom diffusion model (100M parameters), conditional joint distribution modeling, and systematic data augmentation to efficiently capture long-range packing effects. Trained on 600,000 experimentally validated structures, S⁴ achieves inference costs reduced by several orders of magnitude, RMSD₁ < 0.5 Å, and high-fidelity structural recovery in over 80% of test cases. These advances significantly bridge the gap between CSP research and practical deployment.

Technology Category

Application Category

📝 Abstract
Accurately predicting experimentally-realizable 3D molecular crystal structures from their 2D chemical graphs is a long-standing open challenge in computational chemistry called crystal structure prediction (CSP). Efficiently solving this problem has implications ranging from pharmaceuticals to organic semiconductors, as crystal packing directly governs the physical and chemical properties of organic solids. In this paper, we introduce OXtal, a large-scale 100M parameter all-atom diffusion model that directly learns the conditional joint distribution over intramolecular conformations and periodic packing. To efficiently scale OXtal, we abandon explicit equivariant architectures imposing inductive bias arising from crystal symmetries in favor of data augmentation strategies. We further propose a novel crystallization-inspired lattice-free training scheme, Stoichiometric Stochastic Shell Sampling ($S^4$), that efficiently captures long-range interactions while sidestepping explicit lattice parametrization -- thus enabling more scalable architectural choices at all-atom resolution. By leveraging a large dataset of 600K experimentally validated crystal structures (including rigid and flexible molecules, co-crystals, and solvates), OXtal achieves orders-of-magnitude improvements over prior ab initio machine learning CSP methods, while remaining orders of magnitude cheaper than traditional quantum-chemical approaches. Specifically, OXtal recovers experimental structures with conformer $ ext{RMSD}_1<0.5$ Å and attains over 80% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
Problem

Research questions and friction points this paper is trying to address.

Predicts 3D organic crystal structures from 2D chemical graphs.
Models intramolecular conformations and periodic packing distributions.
Improves accuracy and efficiency over prior computational methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

All-atom diffusion model for crystal structure prediction
Data augmentation replaces explicit equivariant architectures
Lattice-free training scheme captures long-range interactions efficiently
🔎 Similar Papers
No similar papers found.