Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

247K/year
🤖 AI Summary
This work addresses the challenge of efficiently exploring vast chemical spaces given the highly imbalanced, multi-fidelity, and computationally expensive nature of atomic-scale materials data. We propose an atomic graph foundation model based on HydraGNN, integrating PaiNN message passing with a multi-task learning architecture to jointly train on 16 first-principles datasets. Leveraging DeepHyper for hyperparameter optimization, ADIOS2/DDStore for scalable data pipelines, and mixed-precision training (BF16/FP32/FP64), we achieve efficient thousand-node-scale training on exascale supercomputers such as Frontier. The resulting model can screen 1.1 billion atomic structures in under 50 seconds—enabling, for the first time, sub-second evaluation of hundreds of millions of structures—and dramatically reduces years of first-principles computational workload. It also demonstrates exceptional transferability across 12 chemically diverse downstream tasks.

Technology Category

Application Category

📝 Abstract
We present an exascale workflow for materials discovery using atomistic graph foundation models built on HydraGNN. We jointly train on 16 open first-principles datasets (544+ million structures covering 85+ elements) using a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. On Frontier, we execute six large-scale DeepHyper hyperparameter optimization campaigns in FP64 and promote the top-performing message-passing models to sustained 2,048-node training, yielding a PaiNN-based lead model. The resulting model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, compressing a workload that would require years of first-principles computation, and supports data-scarce fine-tuning across diverse downstream tasks. We quantify precision-performance tradeoffs (BF16/FP32/FP64), demonstrate transfer across twelve chemically diverse downstream tasks, and establish seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter. This work allows fast and reliable exploration of vast chemical design spaces that are otherwise inaccessible to first-principles methods.
Problem

Research questions and friction points this paper is trying to address.

materials discovery
imbalanced data
multi-fidelity data
atomistic graph models
exascale computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Exascale computing
Graph foundation models
Multi-task learning
Atomistic data
High-throughput screening
Massimiliano Lupo Pasini
Massimiliano Lupo Pasini
Oak Ridge National Laboratory
Artificial IntelligenceDeep LearningMachine LearningNumerical AnalysisHigh Performance Computing
Jong Youl Choi
Jong Youl Choi
Oak Ridge National Laboratory
big data sciencedata intensive computingdata mining
Kshitij Mehta
Kshitij Mehta
Computer Scientist, Oak Ridge National Lab
R
Richard Messerly
National Center for Computational Sciences Division, Oak Ridge National Laboratory
R
Rylie Weaver
Bredesen Center Data Science and Engineering, University of Tennessee, Knoxville
L
Linda Ungerboeck
Bredesen Center Data Science and Engineering, University of Tennessee, Knoxville
I
Isaac Lyngaas
National Center for Computational Sciences Division, Oak Ridge National Laboratory
B
Benajmin Stump
Computational Sciences and Engineering Division, Oak Ridge National Laboratory
Ashwin M. Aji
Ashwin M. Aji
AMD Research
High Performance ComputingParallel ComputationParallel Programming Models
Karl W. Schulz
Karl W. Schulz
University of Texas
HPCresearch computingmodeling/simulationmachine learningwomen's health
Jordà Polo
Jordà Polo
AMD Research
Performance ManagementCloud ComputingDistributed SystemsData AnalyticsGenomics