Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address memory-bandwidth bottlenecks—particularly in sparse matrix computations—that impede performance in scientific computing, this paper proposes a mixed-precision acceleration framework tailored for exascale GPU supercomputing platforms. The method synergistically combines double-precision (FP64) with single- or half-precision (FP32/FP16) arithmetic, employing low-precision data formats in core iterative steps while preserving numerical robustness via high-precision residual correction. It integrates an optimized GMRES solver with customized sparse matrix storage and memory-access strategies. This work presents the first end-to-end mixed-precision sparse linear solver deployed on modern GPU-based exascale systems and introduces HPG-MxP, a lightweight benchmark for mixed-precision sparse solvers. Experiments demonstrate a 1.6× speedup over FP64-only baselines while maintaining solution accuracy, significantly enhancing practical throughput for memory-bound scientific simulations. The approach delivers a deployable, production-ready solution to the “memory wall” challenge.

Technology Category

Application Category

📝 Abstract

Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given HPC system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present a highly optimized implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a combination of double- and single-precision on modern GPU-based supercomputers.

Problem

Research questions and friction points this paper is trying to address.

Evaluate mixed-precision benefits for memory-bound scientific applications

Optimize HPG-MxP benchmark for exascale sparse matrix computations

Measure speedup of mixed-precision algorithms on GPU supercomputers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-precision algorithms for scientific computing

HPG-MxP benchmark for sparse matrix applications

Optimized exascale implementation with 1.6x speedup

🔎 Similar Papers

No similar papers found.