🤖 AI Summary
This work addresses the limited generalization of existing machine learning interatomic potentials when transferring across chemical systems, which often suffer from loss of compositional and structural information. The authors propose TriForces, a model-agnostic tri-stream framework that, for the first time, disentangles composition, structure, and their joint information. By leveraging self-supervised pretraining, TriForces constructs transferable atomic representations without requiring DFT labels, thereby enhancing cross-domain generalization. The method enables efficient similarity-based structure retrieval and outperforms current baselines on MatBench and QM9. Notably, on OMat24, it reduces energy MAE by 57% using only 20,000 training samples and consistently improves force prediction accuracy across varying dataset sizes.
📝 Abstract
Machine learning interatomic potentials (MLIPs) achieve excellent accuracy when trained on large Density Functional Theory (DFT) data. To be useful in practice, they must often be adapted to target chemistries using small and expensive task-specific datasets. However, MLIPs transfer inconsistently across domains, with representations that often loose accessible composition and structure information. To address this, we present TriForces, a model-agnostic three-stream framework that separates composition and structure information, combined with self-supervised learning to preserve transferable representations. TriForces improves performance on MatBench and QM9 over baselines without needing DFT labels and enables efficient similar structure retrieval through its learned latent space. On OMat24, in limited-data training regime, TriForces reduces energy MAE by 57% at 20K samples only and improves force MAE across sample sizes. We release pretrained TriForces variants across multiple MLIP architectures with code at https://github.com/Ramlaoui/triforces.