Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address memory-bandwidth bottlenecks—particularly in sparse matrix computations—that impede performance in scientific computing, this paper proposes a mixed-precision acceleration framework tailored for exascale GPU supercomputing platforms. The method synergistically combines double-precision (FP64) with single- or half-precision (FP32/FP16) arithmetic, employing low-precision data formats in core iterative steps while preserving numerical robustness via high-precision residual correction. It integrates an optimized GMRES solver with customized sparse matrix storage and memory-access strategies. This work presents the first end-to-end mixed-precision sparse linear solver deployed on modern GPU-based exascale systems and introduces HPG-MxP, a lightweight benchmark for mixed-precision sparse solvers. Experiments demonstrate a 1.6× speedup over FP64-only baselines while maintaining solution accuracy, significantly enhancing practical throughput for memory-bound scientific simulations. The approach delivers a deployable, production-ready solution to the “memory wall” challenge.

Technology Category

Application Category

📝 Abstract
Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given HPC system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present a highly optimized implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a combination of double- and single-precision on modern GPU-based supercomputers.
Problem

Research questions and friction points this paper is trying to address.

Evaluate mixed-precision benefits for memory-bound scientific applications
Optimize HPG-MxP benchmark for exascale sparse matrix computations
Measure speedup of mixed-precision algorithms on GPU supercomputers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-precision algorithms for scientific computing
HPG-MxP benchmark for sparse matrix applications
Optimized exascale implementation with 1.6x speedup
🔎 Similar Papers
No similar papers found.
Aditya Kashi
Aditya Kashi
Oak Ridge National Laboratory
Scalable numerical methodshigh-performance computing
N
Nicholson Koukpaizan
Oak Ridge National Laboratory, National Center for Computational Sciences
H
Hao Lu
Oak Ridge National Laboratory, National Center for Computational Sciences
M
Michael Matheson
Oak Ridge National Laboratory, National Center for Computational Sciences
Sarp Oral
Sarp Oral
Oak Ridge National Laboratory
HPCParallel I/OStorage
Feiyi Wang
Feiyi Wang
Distinguished Research Scientist & Group Leader, Analytics and AI Methods at Scale, NCCS/ORNL
HPCAI for Science at Scale