🤖 AI Summary
Large-scale many-body problems in quantum chemistry face severe scalability bottlenecks due to the exponential computational cost of training neural network quantum states (NQS). This work proposes a high-performance NQS training framework addressing these challenges. Methodologically, it introduces: (1) a multi-level parallel strategy based on hybrid sampling, enabling co-optimization of local energy evaluation and parameter updates; (2) cache-centric memory management tailored for Transformer-based ansätze; and (3) fine-grained workload partitioning with communication–computation overlap. Evaluated on a 1536-node supercomputer, the framework achieves up to 8.41× strong scaling speedup and 95.8% parallel efficiency. It significantly accelerates training while enhancing memory stability—enabling scalable, ab initio electronic structure simulations for molecular systems comprising thousands of atoms.
📝 Abstract
Solving quantum many-body problems is one of the fundamental challenges in quantum chemistry. While neural network quantum states (NQS) have emerged as a promising computational tool, its training process incurs exponentially growing computational demands, becoming prohibitively expensive for large-scale molecular systems and creating fundamental scalability barriers for real-world applications. To address above challenges, we present ours, a high-performance NQS training framework for extit{ab initio} electronic structure calculations. First, we propose a scalable sampling parallelism strategy with multi-layers workload division and hybrid sampling scheme, which break the scalability barriers for large-scale NQS training. Then, we introduce multi-level parallelism local energy parallelism, enabling more efficient local energy computation. Last, we employ cache-centric optimization for transformer-based extit{ansatz} and incorporate it with sampling parallelism strategy, which further speedup up the NQS training and achieve stable memory footprint at scale. Experiments demonstrate that ours accelerate NQS training with up to 8.41x speedup and attains a parallel efficiency up to 95.8% when scaling to 1,536 nodes.