Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address efficiency and energy-efficiency bottlenecks of General Matrix Multiplication (GEMM) on heterogeneous hardware, this work proposes a fine-grained adaptive mixed-precision framework. Unlike conventional layer- or tensor-level coarse-grained approaches, it dynamically selects optimal numerical precisions (e.g., FP16/FP32/FP64) at the block level, tightly coupling precision selection with hardware-aware block scheduling. The framework integrates the PaRSEC runtime to enable cross-architecture task load balancing and low-overhead precision transitions across ARM CPUs, NVIDIA GPUs, and AMD GPUs. This bridges the gap between algorithmic numerical robustness requirements and hardware-specific computational capability and energy-efficiency characteristics. Evaluations on supercomputing platforms—including Fugaku, Frontier, and NVIDIA A100 DGX—demonstrate up to 2.1× speedup and 1.8× energy-efficiency improvement over single-precision baselines, while preserving numerical stability critical for scientific applications.

Technology Category

Application Category

📝 Abstract

General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research introduces an adaptive mixed-precision GEMM framework that supports different precision formats at fine-grained tile/block levels. We utilize the PaRSEC runtime system to balance workloads across various architectures. The performance scales well on ARM CPU-based Fugaku supercomputer, Nvidia GPU-based A100 DGX, and AMD GPU-based Frontier supercomputer. This research aims to enhance computational efficiency and accuracy by bridging algorithmic advancements and hardware innovations, driving transformative progress in various applications.

Problem

Research questions and friction points this paper is trying to address.

Optimizing mixed-precision matrix multiplication for hardware efficiency

Developing tile-level adaptive precision for GEMM operations

Bridging algorithmic advancements with emerging hardware capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-precision GEMM with tile-level adaptation

PaRSEC runtime for cross-architecture workload balancing

Optimized performance across ARM/NVIDIA/AMD supercomputers

🔎 Similar Papers

No similar papers found.