🤖 AI Summary
Existing structural analysis tools (e.g., CNA, PTM) rely on handcrafted templates, support only a limited set of crystal lattices, struggle with strong thermal disorder and complex defects, and yield hard labels without confidence estimates or continuous order parameters. To address these limitations, we propose the first unified probabilistic foundation model that jointly performs crystal denoising, phase classification, and order parameter extraction. Leveraging the MACE-MP interatomic potential and the AFLOW prototype library, the model directly predicts per-atom logits over hundreds of crystal phases and global log-density. Differentiable denoising is achieved via gradient ascent; phase labels are obtained via argmax, while the logits themselves serve as defect-sensitive, geometrically interpretable continuous order parameters. The model demonstrates broad generalizability across数百 of prototypes and exhibits exceptional robustness and accuracy in challenging systems—including ice polytypes, ice–water interfaces, and shock-compressed titanium.
📝 Abstract
Atomistic simulations generate large volumes of noisy structural data, but extracting phase labels, order parameters (OPs), and defect information in a way that is universal, robust, and interpretable remains challenging. Existing tools such as PTM and CNA are restricted to a small set of hand-crafted lattices (e.g. FCC/BCC/HCP), degrade under strong thermal disorder or defects, and produce hard, template-based labels without per-atom probability or confidence scores. Here we introduce a log-probability foundation model that unifies denoising, phase classification, and OP extraction within a single probabilistic framework. We reuse the MACE-MP foundation interatomic potential on crystal structures mapped to AFLOW prototypes, training it to predict per-atom, per-phase logits $l$ and to aggregate them into a global log-density $log hat{P}_θ(oldsymbol{r})$ whose gradient defines a conservative score field. Denoising corresponds to gradient ascent on this learned log-density, phase labels follow from $argmax_c l_{ac}$, and the $l$ values act as continuous, defect-sensitive and interpretable OPs quantifying the Euclidean distance to ideal phases. We demonstrate universality across hundreds of prototypes, robustness under strong thermal and defect-induced disorder, and accurate treatment of complex systems such as ice polymorphs, ice--water interfaces, and shock-compressed Ti.