PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer

📅 2024-06-16

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Plane-wave density functional theory (PW-DFT) struggles to scale to ten-thousand-atom ab initio simulations on domestic Sunway supercomputers due to the severe memory constraint of only 16 GB per node. Method: We propose a full-stack, architecture-aware parallel optimization framework for PW-DFT tailored to the Sunway many-core architecture, integrating MPI+OpenMP hybrid parallelism, sparse fast Fourier transforms (FFT), adaptive k-point sampling, low-rank density matrix compression, and customized many-core vectorization. Contribution/Results: Our approach achieves, for the first time, a PW-DFT calculation on a 16,384-atom system within a single 16-GB-memory node—setting a new record for atomic-scale capacity in plane-wave methods. On a 4,096-silicon-atom benchmark, it delivers a 64.8× speedup over baseline implementations. This work overcomes both memory and computational bottlenecks of PW-DFT on indigenous supercomputing platforms and establishes a scalable, high-performance implementation pathway for large-scale materials simulations.

Technology Category

Application Category

📝 Abstract

First-principles density functional theory (DFT) with plane wave (PW) basis set is the most widely used method in quantum mechanical material simulations due to its advantages in accuracy and universality. However, a perceived drawback of PW-based DFT calculations is their substantial computational cost and memory usage, which currently limits their ability to simulate large-scale complex systems containing thousands of atoms. This situation is exacerbated in the new Sunway supercomputer, where each process is limited to a mere 16 GB of memory. Herein, we present a novel parallel implementation of plane wave density functional theory on the new Sunway supercomputer (PWDFT-SW). PWDFT-SW fully extracts the benefits of Sunway supercomputer by extensively refactoring and calibrating our algorithms to align with the system characteristics of the Sunway system. Through extensive numerical experiments, we demonstrate that our methods can substantially decrease both computational costs and memory usage. Our optimizations translate to a speedup of 64.8x for a physical system containing 4,096 silicon atoms, enabling us to push the limit of PW-based DFT calculations to large-scale systems containing 16,384 carbon atoms.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost of plane-wave DFT

Lowering memory usage for large-scale simulations

Enabling DFT calculations for 16K atom systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel implementation on Sunway supercomputer

Extensive algorithm refactoring for efficiency

Significantly reduces computational cost and memory

🔎 Similar Papers

No similar papers found.