🤖 AI Summary
In materials science, the scarcity of crystal structure information limits the generalizability of property prediction models. To address this, we propose a scalable multimodal learning framework that operates without atomic structural inputs. The framework jointly processes elemental composition and X-ray diffraction (XRD) patterns as dual modalities, employing modality-specific encoders and a cross-attention fusion module. We introduce two novel pretraining strategies: masked XRD modeling (MXM) and cross-modal contrastive alignment. Trained on a 5-million-sample dataset, the model achieves up to 4.2× faster convergence compared to baselines, yields significantly higher-quality multimodal representations than unimodal counterparts, and exhibits consistent performance gains with increasing data scale. This work breaks the conventional structure-dependent paradigm, establishing a new pathway toward experimental-data-driven foundation models for materials science.
📝 Abstract
Recent advances in materials discovery have been driven by structure-based models, particularly those using crystal graphs. While effective for computational datasets, these models are impractical for real-world applications where atomic structures are often unknown or difficult to obtain. We propose a scalable multimodal framework that learns directly from elemental composition and X-ray diffraction (XRD) -- two of the more available modalities in experimental workflows without requiring crystal structure input. Our architecture integrates modality-specific encoders with a cross-attention fusion module and is trained on the 5-million-sample Alexandria dataset. We present masked XRD modeling (MXM), and apply MXM and contrastive alignment as self-supervised pretraining strategies. Pretraining yields faster convergence (up to 4.2x speedup) and improves both accuracy and representation quality. We further demonstrate that multimodal performance scales more favorably with dataset size than unimodal baselines, with gains compounding at larger data regimes. Our results establish a path toward structure-free, experimentally grounded foundation models for materials science.