🤖 AI Summary
Programming heterogeneous exascale HPC systems is hindered by the proliferation of complex, incompatible programming models (e.g., CUDA, SYCL, OpenMP) and the lack of traceability across CPU/GPU execution contexts. Method: We propose the first semantic-aware, full-stack API tracing framework built upon LTTng kernel tracing, integrating user-space dynamic instrumentation with multi-model API signature parsing to capture fine-grained, low-overhead, configurable API call chains across hardware and programming abstractions. Contribution/Results: Unlike conventional tracers logging only function names and timestamps, our framework enables cross-vendor, cross-abstraction behavioral correlation and end-to-end call-chain reconstruction in real HPC applications. It accurately identifies cross-model performance bottlenecks and implementation flaws, improving debugging efficiency by over 3× and significantly enhancing portability and debuggability of heterogeneous programming models.
📝 Abstract
As we reach exascale, production High Performance Computing (HPC) systems are increasing in complexity. These systems now comprise multiple heterogeneous computing components (CPUs and GPUs) utilized through diverse, often vendor-specific programming models. As application developers and programming models experts develop higher-level, portable programming models for these systems, debugging and performance optimization requires understanding how multiple programming models stacked on top of each other interact with one another. This paper discusses THAPI (Tracing Heterogeneous APIs), a portable, programming model-centric tracing framework: by capturing comprehensive API call details across layers of the HPC software stack, THAPI enables fine-grained understanding and analysis of how applications interact with programming models and heterogeneous hardware. Leveraging state of the art tracing f ramework like the Linux Trace Toolkit Next Generation (LTTng) and tracing much more than other tracing toolkits, focused on function names and timestamps, this approach enables us to diagnose performance bottlenecks across the software stack, optimize application behavior, and debug programming model implementation issues.