XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

In materials science, the scarcity of crystal structure information limits the generalizability of property prediction models. To address this, we propose a scalable multimodal learning framework that operates without atomic structural inputs. The framework jointly processes elemental composition and X-ray diffraction (XRD) patterns as dual modalities, employing modality-specific encoders and a cross-attention fusion module. We introduce two novel pretraining strategies: masked XRD modeling (MXM) and cross-modal contrastive alignment. Trained on a 5-million-sample dataset, the model achieves up to 4.2× faster convergence compared to baselines, yields significantly higher-quality multimodal representations than unimodal counterparts, and exhibits consistent performance gains with increasing data scale. This work breaks the conventional structure-dependent paradigm, establishing a new pathway toward experimental-data-driven foundation models for materials science.

Technology Category

Application Category

📝 Abstract

Recent advances in materials discovery have been driven by structure-based models, particularly those using crystal graphs. While effective for computational datasets, these models are impractical for real-world applications where atomic structures are often unknown or difficult to obtain. We propose a scalable multimodal framework that learns directly from elemental composition and X-ray diffraction (XRD) -- two of the more available modalities in experimental workflows without requiring crystal structure input. Our architecture integrates modality-specific encoders with a cross-attention fusion module and is trained on the 5-million-sample Alexandria dataset. We present masked XRD modeling (MXM), and apply MXM and contrastive alignment as self-supervised pretraining strategies. Pretraining yields faster convergence (up to 4.2x speedup) and improves both accuracy and representation quality. We further demonstrate that multimodal performance scales more favorably with dataset size than unimodal baselines, with gains compounding at larger data regimes. Our results establish a path toward structure-free, experimentally grounded foundation models for materials science.

Problem

Research questions and friction points this paper is trying to address.

Overcoming reliance on atomic structure data in materials science models

Enabling multimodal learning from composition and XRD without crystal input

Improving accuracy and scalability with self-supervised pretraining strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal learning without crystal structure input

Self-supervised pretraining with masked XRD modeling

Scalable framework integrating composition and XRD

🔎 Similar Papers

Multimodal Learning for Materials