MT4G: A Tool for Reliable Auto-Discovery of NVIDIA and AMD GPU Compute and Memory Topologies

📅 2025-11-08
🏛️ Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
GPU topology information—such as compute/memory hierarchy, cache sizes, interconnect bandwidths, and physical layout—is severely fragmented, incomplete, and vendor-locked, hindering performance modeling and optimization in HPC and AI systems. To address this, we propose the first cross-vendor (NVIDIA/AMD), open-source framework for automatic GPU hardware topology discovery. Our approach integrates CUDA/HIP APIs with over 50 customized microbenchmarks and employs statistical validation—including Kolmogorov–Smirnov tests—to infer non-programmable hardware attributes with high confidence. We validate the framework across ten mainstream GPUs, demonstrating broad compatibility and measurement accuracy. Furthermore, we integrate it into three critical workflows: analytical performance modeling, bottleneck analysis, and dynamic resource partitioning. This integration significantly enhances system-level hardware awareness and improves resource utilization efficiency.

Technology Category

Application Category

📝 Abstract
Understanding GPU topology is essential for performance-related tasks in HPC or AI. Yet, unlike for CPUs with tools like hwloc, GPU information is hard to come by, incomplete, and vendor-specific. In this work, we address this gap and present MT4G, an open-source and vendor-agnostic tool that automatically discovers GPU compute and memory topologies and configurations, including cache sizes, bandwidths, and physical layouts. MT4G combines existing APIs with a suite of over 50 microbenchmarks, applying statistical methods, such as the Kolmogorov-Smirnov test, to automatically and reliably identify otherwise programmatically unavailable topological attributes. We showcase MT4G's universality on ten different GPUs and demonstrate its impact through integration into three workflows: GPU performance modeling, GPUscout bottleneck analysis, and dynamic resource partitioning. These scenarios highlight MT4G's role in understanding system performance and characteristics across NVIDIA and AMD GPUs, providing an automated, portable solution for modern HPC and AI systems.
Problem

Research questions and friction points this paper is trying to address.

Automating GPU topology discovery for HPC and AI systems
Addressing vendor-specific limitations in GPU information access
Providing reliable identification of unavailable GPU topological attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source tool auto-discovers GPU topologies
Combines APIs with microbenchmarks for reliability
Applies statistical methods to identify hidden attributes
S
Stepan Vanecek
Technical University of Munich, Garching, Germany
M
Manuel Walter Mußbacher
Technical University of Munich, Garching, Germany
D
Dominik Größler
Technical University of Munich, Garching, Germany
Urvij Saroliya
Urvij Saroliya
Technical University of Munich, Garching, Germany
Martin Schulz
Martin Schulz
Technical University of Munich
Computer Architecture and Parallel Systems