🤖 AI Summary
Crystal property prediction faces challenges including high experimental and DFT computational costs, reliance of existing ML models on scarce labeled data, weak structural representation capability, and insufficient physical consistency. This work introduces the first physics-informed Transformer foundation model for crystals. First, we propose Symmetry-Consistent Order-Parameter Encoding (SCOPE), a coordinate-free, symmetry-preserving string-based representation unifying space group, Wyckoff positions, and composition for high-fidelity, interpretable lattice encoding. Second, we introduce the CLOUD-DEBYE framework, which differentiably embeds the Debye model to ensure thermodynamically consistent temperature-dependent property prediction. Third, leveraging over 6 million crystal structures for self-supervised pretraining and multi-task fine-tuning, our model achieves state-of-the-art performance across downstream tasks. Notably, it enables zero-shot prediction of phonon internal energy and heat capacity with superior generalization and physical consistency compared to prior methods.
📝 Abstract
The prediction of crystal properties is essential for understanding structure-property relationships and accelerating the discovery of functional materials. However, conventional approaches relying on experimental measurements or density functional theory (DFT) calculations are often resource-intensive, limiting their scalability. Machine learning (ML) models offer a promising alternative by learning complex structure-property relationships from data, enabling faster predictions. Yet, existing ML models often rely on labeled data, adopt representations that poorly capture essential structural characteristics, and lack integration with physical principles--factors that limit their generalizability and interpretability. Here, we introduce CLOUD (Crystal Language mOdel for Unified and Differentiable materials modeling), a transformer-based framework trained on a novel Symmetry-Consistent Ordered Parameter Encoding (SCOPE) that encodes crystal symmetry, Wyckoff positions, and composition in a compact, coordinate-free string representation. Pre-trained on over six million crystal structures, CLOUD is fine-tuned on multiple downstream tasks and achieves competitive performance in predicting a wide range of material properties, demonstrating strong scaling performance. Furthermore, as proof of concept of differentiable materials modeling, CLOUD is applied to predict the phonon internal energy and heat capacity, which integrates the Debye model to preserve thermodynamic consistency. The CLOUD-DEBYE framework enforces thermodynamic consistency and enables temperature-dependent property prediction without requiring additional data. These results demonstrate the potential of CLOUD as a scalable and physics-informed foundation model for crystalline materials, unifying symmetry-consistent representations with physically grounded learning for property prediction and materials discovery.