🤖 AI Summary
Neural network molecular dynamics (NNMD) suffers from severe performance bottlenecks in systems with long-range electrostatic interactions, primarily due to computational overhead from neural network inference and Ewald summation. To address this, we propose a co-optimization framework tailored for exascale supercomputing platforms. Our approach introduces: (i) hardware-accelerated FFT offloading for efficient long-range force computation; (ii) fine-grained intra-core overlap between neural network inference and long-range force evaluation; (iii) a ring-based atom-level load-balancing scheme to minimize inter-node communication; and (iv) deep integration with the DPLR framework and Fugaku’s heterogeneous architecture. Evaluated on the Fugaku supercomputer, our method achieves up to 37× speedup over baseline NNMD implementations, attaining simulation throughput of 51 ns/day—surpassing prior state-of-the-art for NNMD-based long-range electrostatics. This work establishes a scalable paradigm for high-fidelity, long-timescale simulations of large-scale ionic and polar systems.
📝 Abstract
Neural network-based molecular dynamics (NNMD) simulations incorporating long-range electrostatic interactions have significantly extended the applicability to heterogeneous and ionic systems, enabling effective modeling critical physical phenomena such as protein folding and dipolar surface and maintaining ab initio accuracy. However, neural network inference and long-range force computation remain the major bottlenecks, severely limiting simulation speed. In this paper, we target DPLR, a state-of-the-art NNMD package that supports long-range electrostatics, and propose a set of comprehensive optimizations to enhance computational efficiency. We introduce (1) a hardware-offloaded FFT method to reduce the communication overhead; (2) an overlapping strategy that hides long-range force computations using a single core per node, and (3) a ring-based load balancing method that enables atom-level task evenly redistribution with minimal communication overhead. Experimental results on the Fugaku supercomputer show that our work achieves a 37x performance improvement, reaching a maximum simulation speed of 51 ns/day.