🤖 AI Summary
Existing crystal generation methods are typically task-specific and lack a unified framework capable of handling multimodal tasks such as structure prediction and de novo generation. This work proposes the Multimodal Crystal Flow model (MCFlow), which, for the first time, unifies diverse crystal generation tasks within a shared continuous normalizing flow framework. By introducing independent time variables for atomic species and crystal structures, MCFlow enables flexible generation across arbitrary modalities. The approach integrates composition- and symmetry-aware atom ordering, hierarchical permutation augmentation, and a standard Transformer architecture to effectively embed physical priors without relying on explicit templates. Evaluated on the MP-20 and MPTS-52 benchmarks, MCFlow achieves or surpasses the performance of specialized models across multiple tasks.
📝 Abstract
Crystal modeling spans a family of conditional and unconditional generation tasks across different modalities, including crystal structure prediction (CSP) and \emph{de novo} generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across different generation tasks. To address this limitation, we propose \emph{Multimodal Crystal Flow (MCFlow)}, a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting strong compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that MCFlow achieves competitive performance against task-specific baselines across multiple crystal generation tasks.