MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow

📅 2025-01-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current high-throughput design of metal–organic frameworks (MOFs) suffers from decoupled generation and screening stages, alongside inefficient heterogeneous computing resource coordination. Method: We introduce the first open-source workflow integrating generative AI with multiscale physics-based simulations—combining diffusion models/VAEs for molecular generation, molecular dynamics, density functional theory (DFT), and grand-canonical Monte Carlo (GCMC) simulations. We further propose an online-learning-driven, CPU/GPU-adaptive scheduling framework to enable closed-loop generation–simulation optimization. Contribution/Results: We present a modular scientific AI architecture enabling cross-domain reusability. Deployed on a 450-node supercomputer, the system achieves thousand-GPU-scale AI–simulation co-execution, delivering industry-leading MOF generation throughput. The top-performing structures rank within the top 10% of the hMOF dataset for CO₂ adsorption capacity. Crucially, generation quality scales linearly with compute node count, demonstrating the feasibility of large-scale HPC-enabled generative AI for accelerated materials discovery.

Technology Category

Application Category

📝 Abstract
We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs using molecular dynamics, density functional theory, and Monte Carlo simulations. These heterogeneous tasks are unified within an online learning framework that optimizes the utilization of available CPU and GPU resources across HPC systems. Performance metrics from a 450-node (14,400 AMD Zen 3 CPUs + 1800 NVIDIA A100 GPUs) supercomputer run demonstrate that MOFA achieves high-throughput generation of novel MOF structures, with CO$_2$ adsorption capacities ranking among the top 10 in the hypothetical MOF (hMOF) dataset. Furthermore, the production of high-quality MOFs exhibits a linear relationship with the number of nodes utilized. The modular architecture of MOFA will facilitate its integration into other scientific applications that dynamically combine GenAI with large-scale simulations.
Problem

Research questions and friction points this paper is trying to address.

CO2 Capture
Metal-Organic Frameworks (MOFs)
High-Performance Computing (HPC)
Innovation

Methods, ideas, or system contributions that make the work stand out.

Artificial Intelligence
High-Performance Computing
Metal-Organic Frameworks
X
Xiaoli Yan
Argonne National Laboratory; Lemont, IL, United States, University of Illinois Chicago; Chicago, IL, United States
Nathaniel Hudson
Nathaniel Hudson
Assistant Professor, Illinois Institute of Technology
Edge ComputingEdge IntelligenceInternet-of-ThingsSocial NetworksCyber-Physical Systems
H
Hyun Park
Argonne National Laboratory; Lemont, IL, United States, University of Illinois Urbana-Champaign; Urbana, IL, United States
Daniel Grzenda
Daniel Grzenda
Graduate Student, University of Chicago
representation learningspatio-temporaldeep learning systemsoptimization
J
J. G. Pauloski
University of Chicago; Chicago, IL, United States
M
Marcus Schwarting
University of Chicago; Chicago, IL, United States
Haochen Pan
Haochen Pan
University of Chicago
Distributed SystemsCloud Computing
H
Hassan Harb
Argonne National Laboratory; Lemont, IL, United States
S
Samuel Foreman
Argonne National Laboratory; Lemont, IL, United States
Chris Knight
Chris Knight
Argonne National Laboratory; Lemont, IL, United States
T
Tom Gibbs
NVIDIA Inc.; Santa Clara, CA, United States
Kyle Chard
Kyle Chard
University of Chicago and Argonne National Laboratory
computer sciencedistributed systemshigh performance computingscientific computing
S
Santanu Chaudhuri
Argonne National Laboratory; Lemont, IL, United States, University of Illinois Chicago; Chicago, IL, United States
E
Emad Tajkhorshid
University of Illinois Urbana-Champaign; Urbana, IL, United States
I
Ian Foster
Argonne National Laboratory; Lemont, IL, United States, University of Chicago; Chicago, IL, United States
M
Mohamad Moosavi
University of Toronto; Toronto, Ontario
Logan Ward
Logan Ward
NVIDIA (formally Argonne National Laboratory)
AI for ScienceHigh Performance Computing
E
E. Huerta
Argonne National Laboratory; Lemont, IL, United States, University of Chicago; Chicago, IL, United States, University of Illinois Urbana-Champaign; Urbana, IL, United States