gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

📅 2024-04-01

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Fine-grained, cross-platform real-time monitoring of GPU resources—particularly GPU memory peak usage and computational utilization—remains unsupported in Unix/Linux environments. Method: This paper introduces the first lightweight, dependency-free Python tool leveraging the NVIDIA Management Library (NVML) API. It employs multithreading and process-hooking techniques to enable low-overhead (average 0.3%) background sampling and precise peak capture of CPU/GPU utilization and system/GPU memory consumption. Contribution/Results: The tool unifies analysis across desktop and HPC environments with high accuracy (GPU memory peak error <2%). It enables job-level GPU resource profiling—the first such capability for fine-grained, runtime GPU characterization in HPC settings—thereby addressing a critical gap in production-grade GPU observability. The implementation is open-source and has been integrated into multiple scientific computing pipelines.

Technology Category

Application Category

📝 Abstract

Determining the maximum usage of random-access memory (RAM) on both the motherboard and on a graphical processing unit (GPU) over the lifetime of a computing task can be extremely useful for troubleshooting points of failure as well as optimizing memory utilization, especially within a high-performance computing (HPC) setting. While there are tools for tracking compute time and RAM, including by job management tools themselves, tracking of GPU usage, to our knowledge, does not currently have sufficient solutions. We present gpu_tracker, a Python package that tracks the computational resource usage of a task while running in the background, including the real compute time that the task takes to complete, its maximum RAM usage, and the maximum GPU RAM usage, specifically for Nvidia GPUs. We demonstrate that gpu_tracker can seamlessly track computational resource usage with minimal overhead, both within desktop and HPC execution environments.

Problem

Research questions and friction points this paper is trying to address.

Tracks GPU and CPU utilization in HPC environments

Monitors maximum RAM usage on motherboard and GPU

Provides real-time resource profiling with minimal overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tracks GPU and CPU utilization in real-time

Monitors RAM usage on motherboard and GPU

Supports both Nvidia and AMD GPUs

🔎 Similar Papers

Data-Driven Analysis to Understand GPU Hardware Resource Usage of Optimizations