🤖 AI Summary
To address the complexity and overfitting susceptibility of multi-stream architectures in crystal structure generation, this work proposes a lightweight diffusion Transformer model. It eliminates multi-stream design and instead jointly models lattice parameters and atomic properties as a unified sequence, incorporating a periodic-table-guided atomic embedding and a balanced training strategy to imbue the model with strong physics-informed inductive bias. This approach significantly enhances generalization and stability under data-scarce scientific settings. On the MP-20 dataset, the method achieves a success rate (SUN) of 9.62%, surpassing FlowMM and MatterGen; 63.28% of generated structures are both unique and novel while maintaining comparable energy stability. Results demonstrate that, for scientific discovery tasks, a simple, physics-inspired single-stream architecture can outperform complex multi-stream designs.
📝 Abstract
We present CrystalDiT, a diffusion transformer for crystal structure generation that achieves state-of-the-art performance by challenging the trend of architectural complexity. Instead of intricate, multi-stream designs, CrystalDiT employs a unified transformer that imposes a powerful inductive bias: treating lattice and atomic properties as a single, interdependent system. Combined with a periodic table-based atomic representation and a balanced training strategy, our approach achieves 9.62% SUN (Stable, Unique, Novel) rate on MP-20, substantially outperforming recent methods including FlowMM (4.38%) and MatterGen (3.42%). Notably, CrystalDiT generates 63.28% unique and novel structures while maintaining comparable stability rates, demonstrating that architectural simplicity can be more effective than complexity for materials discovery. Our results suggest that in data-limited scientific domains, carefully designed simple architectures outperform sophisticated alternatives that are prone to overfitting.