A Pilot Study on Tunable Precision Emulation via Automatic BLAS Offloading

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of executing double-precision matrix multiplication—common in traditional HPC applications such as MuST—on GPUs. We propose a source-code-transparent, tunable-precision simulation method that avoids algorithmic rewriting. Our approach integrates automatic BLAS offloading, INT8 low-bit integer computation, cache-coherent unified memory, and AI-driven adaptive precision scheduling. Crucially, it preserves the original double-precision algorithmic logic while dynamically adapting arithmetic precision and operator characteristics to hardware constraints. This establishes the first “fidelity–efficiency co-design” simulation paradigm, overcoming the limitations of conventional mixed-precision methods that require manual algorithm refactoring. Experiments demonstrate substantial improvements in GPU utilization and execution throughput, alongside controllable trade-offs between numerical accuracy and performance. The framework provides a novel pathway for leveraging AI-accelerated hardware in scientific computing.

Technology Category

Application Category

📝 Abstract
This study explores the use of automatic BLAS offloading and INT8-based emulation for accelerating traditional HPC workloads on modern GPU architectures. Through the use of low-bitwidth integer units and cache-coherent Unified Memory Architecture, we emulate double-precision matrix multiplications in the MuST application without code changes. We find that accuracy depends on both arithmetic precision and the properties of the operator, which can be dealt with through tunable precision emulation. Unlike traditional mixed-precision approaches, this method preserves original algorithms while optimizing hardware utilization. We showcases the potential of improving accuracy and performance at the same time. This work highlights the potential of AI-driven hardware to transform HPC, advocating for adaptive precision strategies in future scientific computing.
Problem

Research questions and friction points this paper is trying to address.

Emulating double-precision matrix multiplications using INT8-based techniques
Optimizing hardware utilization without altering original algorithms
Improving accuracy and performance via tunable precision emulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic BLAS offloading for HPC acceleration
INT8 emulation via low-bitwidth integer units
Tunable precision preserves original algorithms
🔎 Similar Papers
No similar papers found.
H
Hang Liu
Texas Advanced Computing Center, The University of Texas at Austin
J
Junjie Li
Texas Advanced Computing Center, The University of Texas at Austin
Yinzhi Wang
Yinzhi Wang
Texas Advanced Computing Center
SeismologyHigh Performance Computing