Measuring GPU utilization one level deeper

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the coexistence of performance instability and resource underutilization in multi-application GPU co-location, this paper introduces the first kernel-level, fine-grained resource interference quantification framework spanning multiple hardware layers—including compute units, L1/L2 caches, and memory bandwidth—overcoming the limitations of conventional coarse-grained utilization-based modeling. Leveraging micro-benchmarks, hardware performance counter sampling, kernel-level isolation experiments, and interference modeling, the framework enables reproducible characterization of interference behavior across critical subsystems. Based on this, we design a dynamic co-location scheduler with strict service-level objective (SLO) guarantees, achieving over 35% improvement in aggregate GPU utilization while maintaining quality-of-service requirements. This work establishes both theoretical foundations and empirical validation for predictable, high-performance GPU resource sharing.

Technology Category

Application Category

📝 Abstract
GPU hardware is vastly underutilized. Even resource-intensive AI applications have diverse resource profiles that often leave parts of GPUs idle. While colocating applications can improve utilization, current spatial sharing systems lack performance guarantees. Providing predictable performance guarantees requires a deep understanding of how applications contend for shared GPU resources such as block schedulers, compute units, L1/L2 caches, and memory bandwidth. We propose a methodology to profile resource interference of GPU kernels across these dimensions and discuss how to build GPU schedulers that provide strict performance guarantees while colocating applications to minimize cost.
Problem

Research questions and friction points this paper is trying to address.

Multi-application
GPU resource management
Performance stabilization
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU resource management
performance stability
cost efficiency
🔎 Similar Papers
No similar papers found.