On the energy efficiency of sparse matrix computations on multi-GPU clusters

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work addresses the high energy consumption incurred when solving large-scale sparse linear systems on multi-GPU clusters. Methodologically, it introduces the first fine-grained runtime energy analysis framework tailored for sparse matrix computations, integrating GPU-accelerated sparse algorithms, low-overhead inter-node communication strategies, and a custom-built high-accuracy energy measurement tool—specifically targeting redundancy reduction in data movement. Its key contribution lies in the first systematic identification of inter-node data transfer as the primary energy-efficiency bottleneck, enabling principled co-optimization of computation and communication. Evaluated on clusters comprising thousands of GPUs, the framework achieves, on average, a 32% reduction in solution time and a 41% decrease in energy consumption compared to state-of-the-art libraries (e.g., cuSPARSE, PETSc), while demonstrating significantly improved scalability—enabling scientific computations beyond single-node memory capacity.

Technology Category

Application Category

📝 Abstract

We investigate the energy efficiency of a library designed for parallel computations with sparse matrices. The library leverages high-performance, energy-efficient Graphics Processing Unit (GPU) accelerators to enable large-scale scientific applications. Our primary development objective was to maximize parallel performance and scalability in solving sparse linear systems whose dimensions far exceed the memory capacity of a single node. To this end, we devised methods that expose a high degree of parallelism while optimizing algorithmic implementations for efficient multi-GPU usage. Previous work has already demonstrated the library's performance efficiency on large-scale systems comprising thousands of NVIDIA GPUs, achieving improvements over state-of-the-art solutions. In this paper, we extend those results by providing energy profiles that address the growing sustainability requirements of modern HPC platforms. We present our methodology and tools for accurate runtime energy measurements of the library's core components and discuss the findings. Our results confirm that optimizing GPU computations and minimizing data movement across memory and computing nodes reduces both time-to-solution and energy consumption. Moreover, we show that the library delivers substantial advantages over comparable software frameworks on standard benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Investigating energy efficiency of sparse matrix computations on multi-GPU clusters

Optimizing parallel performance for large-scale sparse linear systems

Addressing sustainability requirements through energy-efficient GPU implementations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging energy-efficient GPU accelerators for sparse matrices

Devising methods to maximize parallelism on multi-GPU systems

Optimizing computations and minimizing data movement for efficiency

🔎 Similar Papers

Fast inference with Kronecker-sparse matrices