CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning

📅 2025-06-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Crystal property prediction faces challenges including high experimental and DFT computational costs, reliance of existing ML models on scarce labeled data, weak structural representation capability, and insufficient physical consistency. This work introduces the first physics-informed Transformer foundation model for crystals. First, we propose Symmetry-Consistent Order-Parameter Encoding (SCOPE), a coordinate-free, symmetry-preserving string-based representation unifying space group, Wyckoff positions, and composition for high-fidelity, interpretable lattice encoding. Second, we introduce the CLOUD-DEBYE framework, which differentiably embeds the Debye model to ensure thermodynamically consistent temperature-dependent property prediction. Third, leveraging over 6 million crystal structures for self-supervised pretraining and multi-task fine-tuning, our model achieves state-of-the-art performance across downstream tasks. Notably, it enables zero-shot prediction of phonon internal energy and heat capacity with superior generalization and physical consistency compared to prior methods.

Technology Category

Application Category

📝 Abstract
The prediction of crystal properties is essential for understanding structure-property relationships and accelerating the discovery of functional materials. However, conventional approaches relying on experimental measurements or density functional theory (DFT) calculations are often resource-intensive, limiting their scalability. Machine learning (ML) models offer a promising alternative by learning complex structure-property relationships from data, enabling faster predictions. Yet, existing ML models often rely on labeled data, adopt representations that poorly capture essential structural characteristics, and lack integration with physical principles--factors that limit their generalizability and interpretability. Here, we introduce CLOUD (Crystal Language mOdel for Unified and Differentiable materials modeling), a transformer-based framework trained on a novel Symmetry-Consistent Ordered Parameter Encoding (SCOPE) that encodes crystal symmetry, Wyckoff positions, and composition in a compact, coordinate-free string representation. Pre-trained on over six million crystal structures, CLOUD is fine-tuned on multiple downstream tasks and achieves competitive performance in predicting a wide range of material properties, demonstrating strong scaling performance. Furthermore, as proof of concept of differentiable materials modeling, CLOUD is applied to predict the phonon internal energy and heat capacity, which integrates the Debye model to preserve thermodynamic consistency. The CLOUD-DEBYE framework enforces thermodynamic consistency and enables temperature-dependent property prediction without requiring additional data. These results demonstrate the potential of CLOUD as a scalable and physics-informed foundation model for crystalline materials, unifying symmetry-consistent representations with physically grounded learning for property prediction and materials discovery.
Problem

Research questions and friction points this paper is trying to address.

Predicting crystal properties efficiently without resource-intensive methods
Improving ML models by integrating physical principles and symmetry
Enhancing scalability and interpretability in materials property prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based framework for crystal representation
Symmetry-Consistent Ordered Parameter Encoding (SCOPE)
Integrates Debye model for thermodynamic consistency
🔎 Similar Papers
No similar papers found.