A Pilot Study on Tunable Precision Emulation via Automatic BLAS Offloading

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the inefficiency of executing double-precision matrix multiplication—common in traditional HPC applications such as MuST—on GPUs. We propose a source-code-transparent, tunable-precision simulation method that avoids algorithmic rewriting. Our approach integrates automatic BLAS offloading, INT8 low-bit integer computation, cache-coherent unified memory, and AI-driven adaptive precision scheduling. Crucially, it preserves the original double-precision algorithmic logic while dynamically adapting arithmetic precision and operator characteristics to hardware constraints. This establishes the first “fidelity–efficiency co-design” simulation paradigm, overcoming the limitations of conventional mixed-precision methods that require manual algorithm refactoring. Experiments demonstrate substantial improvements in GPU utilization and execution throughput, alongside controllable trade-offs between numerical accuracy and performance. The framework provides a novel pathway for leveraging AI-accelerated hardware in scientific computing.

Technology Category

Application Category

📝 Abstract

This study explores the use of automatic BLAS offloading and INT8-based emulation for accelerating traditional HPC workloads on modern GPU architectures. Through the use of low-bitwidth integer units and cache-coherent Unified Memory Architecture, we emulate double-precision matrix multiplications in the MuST application without code changes. We find that accuracy depends on both arithmetic precision and the properties of the operator, which can be dealt with through tunable precision emulation. Unlike traditional mixed-precision approaches, this method preserves original algorithms while optimizing hardware utilization. We showcases the potential of improving accuracy and performance at the same time. This work highlights the potential of AI-driven hardware to transform HPC, advocating for adaptive precision strategies in future scientific computing.

Problem

Research questions and friction points this paper is trying to address.

Emulating double-precision matrix multiplications using INT8-based techniques

Optimizing hardware utilization without altering original algorithms

Improving accuracy and performance via tunable precision emulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic BLAS offloading for HPC acceleration

INT8 emulation via low-bitwidth integer units

Tunable precision preserves original algorithms

🔎 Similar Papers

No similar papers found.