🤖 AI Summary
This work addresses the inefficiency and limited structural modeling capacity of conventional crystal generation methods that rely on computationally expensive equivariant graph neural networks. The authors propose a lightweight diffusion Transformer that replaces high-dimensional one-hot encodings with subatomic tokenization and introduces a Geometry-Enhanced Module (GEM) to directly inject periodic geometric information from minimum-image pairs into the attention mechanism. This approach preserves the computational efficiency of standard Transformers while effectively integrating crystallographic chemistry and spatial symmetry. The method achieves state-of-the-art performance in both crystal structure prediction and ab initio generation tasks, attaining the highest S.U.N. discovery score and demonstrating significantly faster sampling speeds compared to existing geometry-intensive baselines.
📝 Abstract
Generative models for crystalline materials often rely on equivariant graph neural networks, which capture geometric structure well but are costly to train and slow to sample. We present Crystalite, a lightweight diffusion Transformer for crystal modeling built around two simple inductive biases. The first is Subatomic Tokenization, a compact chemically structured atom representation that replaces high-dimensional one-hot encodings and is better suited to continuous diffusion. The second is the Geometry Enhancement Module (GEM), which injects periodic minimum-image pair geometry directly into attention through additive geometric biases. Together, these components preserve the simplicity and efficiency of a standard Transformer while making it better matched to the structure of crystalline materials. Crystalite achieves state-of-the-art results on crystal structure prediction benchmarks, and de novo generation performance, attaining the best S.U.N. discovery score among the evaluated baselines while sampling substantially faster than geometry-heavy alternatives.