๐ค AI Summary
To address low per-GPU utilization and suboptimal hardware return-on-investment in heterogeneous multi-GPU systems, this paper proposes a data-driven analytical framework that establishes, for the first time, interpretable correlations between optimization strategies and GPU resource usage patterns. Our method integrates hardware performance counter profiling, multi-objective correlation modeling, and scientific proxy application benchmarks to construct a multidimensional metric suite characterizing application-device interaction behaviors. Unlike prior workโwhich focuses primarily on performance gainsโour approach systematically uncovers the underlying mechanisms by which optimizations affect resource occupancy and utilization. Experimental evaluation on proxy applications demonstrates a 29.6% reduction in execution time, a 5.3% increase in average GPU utilization, and a 26.5% decrease in power consumption. These results establish a novel paradigm for resource-efficient, co-optimized heterogeneous accelerator systems.
๐ Abstract
With heterogeneous systems, the number of GPUs per chip increases to provide computational capabilities for solving science at a nanoscopic scale. However, low utilization for single GPUs defies the need to invest more money for expensive ccelerators. While related work develops optimizations for improving application performance, none studies how these optimizations impact hardware resource usage or the average GPU utilization. This paper takes a data-driven analysis approach in addressing this gap by (1) characterizing how hardware resource usage affects device utilization, execution time, or both, (2) presenting a multi-objective metric to identify important application-device interactions that can be optimized to improve device utilization and application performance jointly, (3) studying hardware resource usage behaviors of several optimizations for a benchmark application, and finally (4) identifying optimization opportunities for several scientific proxy applications based on their hardware resource usage behaviors. Furthermore, we demonstrate the applicability of our methodology by applying the identified optimizations to a proxy application, which improves the execution time, device utilization and power consumption by up to 29.6%, 5.3% and 26.5% respectively.