🤖 AI Summary
This work addresses the limited feature coverage in high-performance computing caused by hardware performance counters constrained by the number of simultaneously collectible metrics. To overcome this limitation without relying on hardware multiplexing, the authors propose a heuristic multi-run execution trace merging method that aligns and fuses counter data collected across multiple program executions. By analyzing MPI communication structures, timing patterns, and behavioral characteristics, the approach constructs a high-dimensional, unified synthetic trace that expands the effective feature space. This enriched representation enables the training of more comprehensive machine learning–based performance models. Experimental evaluation on the MareNostrum5 platform demonstrates that the merged counters retain high accuracy and significantly improve performance prediction for diverse kernel functions and real-world applications.
📝 Abstract
This work extends a framework for predicting the performance of High-Performance Computing (HPC) workloads using Machine Learning (ML). A common limitation in performance modeling is the restricted number of hardware counters that can be collected simultaneously. To address this, we propose a heuristic-based methodology to merge execution traces from multiple runs, each instrumented with a different set of hardware counters. Our approach matches computation bursts across executions by analyzing MPI structure, timing, and communication patterns. This process enables the construction of a unified dataset that includes a wider set of hardware features without relying on multiplexing. The output is a new synthetic trace with all merged counters, which can be used both for HPC performance prediction and for conventional performance analysis. The methodology has been validated on MareNostrum5 machine with a range of kernels and real applications. Results show that the merged counters maintain acceptable accuracy depending on the application, and can be directly used to train ML models on a richer feature space without prior counter selection.