Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP

๐Ÿ“… 2026-03-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional efficiency metrics struggle to accurately assess resource utilization in heterogeneous high-performance computing systems that combine CPUs and accelerators. This work extends the POP efficiency model by introducing a hardware-agnostic, host-device dual-branch hierarchical efficiency framework. It uniquely defines a multiplicative efficiency decomposition on the device side, symmetric to that on the host, separately capturing mixed execution/offload efficiency and device parallel efficiency. Implemented via the lightweight TALP monitoring library, the approach supports both runtime and post-mortem analysis and outputs results in human-readable and machine-readable formats. Experiments on synthetic benchmarks and three real-world HPC applications demonstrate that the proposed methodology effectively uncovers performance bottlenecks related to offloading, load balancing, and task scheduling, offering developers actionable insights for optimization.
๐Ÿ“ Abstract
The increasing adoption of heterogeneous platforms that combine CPUs with accelerators such as GPUs in high-performance computing (HPC) introduces new challenges for performance analysis and optimization. Traditional efficiency metrics, such as those proposed by the Performance Optimization and Productivity (POP) Center of Excellence, were designed primarily for homogeneous CPU-based systems and therefore, do not capture the complex interactions between host and device resources. In this work, we extend the POP efficiency framework to heterogeneous architectures by introducing a new hierarchy of metrics that separately evaluate host and device efficiency. On the host side, we quantify the effectiveness of hybrid execution and offloading operations. On the device side, we propose a multiplicative hierarchy analogous to the host hierarchy and define its Parallel Efficiency branch. Beyond their definition and formulation, we present the implementation of these metrics in the TALP module of the DLB library. TALP is a lightweight monitoring library that provides measurements both post mortem and at runtime, with outputs available in textual and machine-readable formats. We validate the proposed framework through synthetic benchmarks and three production HPC applications, demonstrating how the metrics expose inefficiencies in offloading, load balance, and orchestration. Results show that the extended TALP metrics provide actionable insights to guide developers in optimizing heterogeneous HPC codes.
Problem

Research questions and friction points this paper is trying to address.

heterogeneous computing
performance metrics
efficiency analysis
accelerators
HPC
Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneous computing
efficiency metrics
performance analysis
TALP
accelerator offloading
๐Ÿ”Ž Similar Papers
No similar papers found.