🤖 AI Summary
This work addresses the challenge of efficiently exploring vast chemical spaces given the highly imbalanced, multi-fidelity, and computationally expensive nature of atomic-scale materials data. We propose an atomic graph foundation model based on HydraGNN, integrating PaiNN message passing with a multi-task learning architecture to jointly train on 16 first-principles datasets. Leveraging DeepHyper for hyperparameter optimization, ADIOS2/DDStore for scalable data pipelines, and mixed-precision training (BF16/FP32/FP64), we achieve efficient thousand-node-scale training on exascale supercomputers such as Frontier. The resulting model can screen 1.1 billion atomic structures in under 50 seconds—enabling, for the first time, sub-second evaluation of hundreds of millions of structures—and dramatically reduces years of first-principles computational workload. It also demonstrates exceptional transferability across 12 chemically diverse downstream tasks.
📝 Abstract
We present an exascale workflow for materials discovery using atomistic graph foundation models built on HydraGNN. We jointly train on 16 open first-principles datasets (544+ million structures covering 85+ elements) using a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. On Frontier, we execute six large-scale DeepHyper hyperparameter optimization campaigns in FP64 and promote the top-performing message-passing models to sustained 2,048-node training, yielding a PaiNN-based lead model. The resulting model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, compressing a workload that would require years of first-principles computation, and supports data-scarce fine-tuning across diverse downstream tasks. We quantify precision-performance tradeoffs (BF16/FP32/FP64), demonstrate transfer across twelve chemically diverse downstream tasks, and establish seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter. This work allows fast and reliable exploration of vast chemical design spaces that are otherwise inaccessible to first-principles methods.