🤖 AI Summary
Current high-throughput design of metal–organic frameworks (MOFs) suffers from decoupled generation and screening stages, alongside inefficient heterogeneous computing resource coordination.
Method: We introduce the first open-source workflow integrating generative AI with multiscale physics-based simulations—combining diffusion models/VAEs for molecular generation, molecular dynamics, density functional theory (DFT), and grand-canonical Monte Carlo (GCMC) simulations. We further propose an online-learning-driven, CPU/GPU-adaptive scheduling framework to enable closed-loop generation–simulation optimization.
Contribution/Results: We present a modular scientific AI architecture enabling cross-domain reusability. Deployed on a 450-node supercomputer, the system achieves thousand-GPU-scale AI–simulation co-execution, delivering industry-leading MOF generation throughput. The top-performing structures rank within the top 10% of the hMOF dataset for CO₂ adsorption capacity. Crucially, generation quality scales linearly with compute node count, demonstrating the feasibility of large-scale HPC-enabled generative AI for accelerated materials discovery.
📝 Abstract
We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs using molecular dynamics, density functional theory, and Monte Carlo simulations. These heterogeneous tasks are unified within an online learning framework that optimizes the utilization of available CPU and GPU resources across HPC systems. Performance metrics from a 450-node (14,400 AMD Zen 3 CPUs + 1800 NVIDIA A100 GPUs) supercomputer run demonstrate that MOFA achieves high-throughput generation of novel MOF structures, with CO$_2$ adsorption capacities ranking among the top 10 in the hypothetical MOF (hMOF) dataset. Furthermore, the production of high-quality MOFs exhibits a linear relationship with the number of nodes utilized. The modular architecture of MOFA will facilitate its integration into other scientific applications that dynamically combine GenAI with large-scale simulations.