Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Integrating heterogeneous, multi-source atomic modeling data and scaling graph neural network (GNN) pretraining to ultra-large graphs remain significant challenges. Method: We propose the first multi-task parallel pretraining paradigm tailored for million-scale molecular structures (24M+, spanning five datasets), built upon the open-source HydraGNN framework. Our approach innovatively integrates GNNs with cross-node task-parallel scheduling and heterogeneous memory-aware communication optimization. Contribution/Results: We achieve, for the first time, strong linear scalability up to 1,000 GPUs on three state-of-the-art exascale supercomputers—Perlmutter, Aurora, and Frontier. Throughput increases substantially, while model generalization and transferability across unseen chemical spaces are significantly enhanced. This work establishes a scalable, reproducible infrastructure and methodology for large-language-model–style atomic-scale modeling, advancing foundational capabilities for scientific AI in computational chemistry and materials science.

Technology Category

Application Category

📝 Abstract
Graph foundation models using graph neural networks promise sustainable, efficient atomistic modeling. To tackle challenges of processing multi-source, multi-fidelity data during pre-training, recent studies employ multi-task learning, in which shared message passing layers initially process input atomistic structures regardless of source, then route them to multiple decoding heads that predict data-specific outputs. This approach stabilizes pre-training and enhances a model's transferability to unexplored chemical regions. Preliminary results on approximately four million structures are encouraging, yet questions remain about generalizability to larger, more diverse datasets and scalability on supercomputers. We propose a multi-task parallelism method that distributes each head across computing resources with GPU acceleration. Implemented in the open-source HydraGNN architecture, our method was trained on over 24 million structures from five datasets and tested on the Perlmutter, Aurora, and Frontier supercomputers, demonstrating efficient scaling on all three highly heterogeneous super-computing architectures.
Problem

Research questions and friction points this paper is trying to address.

Robust pre-training of graph foundation models on multi-source data
Efficient scaling of multi-task parallelism on supercomputers
Enhancing model transferability to unexplored chemical regions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task parallelism for robust pre-training
GPU-accelerated distributed decoding heads
Scalable on heterogeneous supercomputing architectures
🔎 Similar Papers
No similar papers found.