🤖 AI Summary
This work addresses a critical performance bottleneck in hybrid quantum–HPC algorithms, where the computation of ground- and excited-state energies of molecular Hamiltonians is often limited by the classical diagonalization step. The study presents the first systematic application of OpenMP Offload–based GPU acceleration to sample-based quantum diagonalization, integrating the Davidson algorithm with electronic configuration screening. Implemented across six heterogeneous supercomputing platforms—including Frontier—the resulting high-performance computing workflow demonstrates both efficiency and portability. On a single node, the approach achieves approximately two orders of magnitude speedup in diagonalization performance, reducing classical post-processing times from several hours to mere minutes and thereby significantly accelerating the overall quantum–HPC hybrid workflow.
📝 Abstract
Hybrid quantum-HPC algorithms advance research by delegating complex tasks to quantum processors and using HPC systems to orchestrate workflows and complementary computations. Sample-based quantum diagonalization (SQD) is a hybrid quantum-HPC method in which information from a molecular Hamiltonian is encoded into a quantum circuit for evaluation on a quantum computer. A set of measurements on the quantum computer yields electronic configurations that are filtered on the classical computer, which also performs diagonalization on the selected subspace and identifies configurations to be carried over to the next step in an iterative process. Diagonalization is the most demanding task for the classical computer. Previous studies used the Fugaku supercomputer and a highly scalable diagonalization code designed for CPUs. In this work, we describe our efforts to enable efficient scalable and portable diagonalization on heterogeneous systems using GPUs as the main compute engines based on the previous work. GPUs provide massive on-device thread-level parallelism that is well aligned with the algorithms used for diagonalization. We focus on the computation of ground-state energies and wavefunctions using the Davidson algorithm with a selected set of electron configurations. We describe the offload strategy, code transformations, and data-movement, with examples of measurements on the Frontier supercomputer and five other GPU accelerated systems. Our measurements show that GPUs provide an outstanding performance boost of order 100x on a per-node basis. This dramatically expedites the diagonalization step-essential for extracting ground and excited state energies-bringing the classical processing time down from hours to minutes.