GraphPerf-RT: A Graph-Driven Performance Model for Hardware-Aware Scheduling of OpenMP Codes

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Predicting OpenMP workload performance on heterogeneous embedded SoCs is challenging due to strong couplings among task DAG structure, irregular control flow, cache/branch behavior, and thermal dynamics. To address this, we propose the first heterogeneous graph neural network (HGNN) surrogate model that jointly encodes task graph topology, CFG semantics, and real-time hardware states—including DVFS settings, temperature, and core utilization. We unify these three heterogeneous information sources into a typed-edge heterogeneous graph and introduce a multi-task evidential learning head with Normal-Inverse-Gamma distribution for calibrated uncertainty quantification and risk-aware prediction. Evaluated on Jetson TX2, Orin NX, and RUBIK Pi platforms, our model achieves R² > 0.95 and expected calibration error (ECE) < 0.05. Integrated into the MAMBRL-D3QN scheduler, it reduces makespan by 66% and energy consumption by 82%, significantly outperforming model-agnostic baselines.

Technology Category

Application Category

📝 Abstract
Performance prediction for OpenMP workloads on heterogeneous embedded SoCs is challenging due to complex interactions between task DAG structure, control-flow irregularity, cache and branch behavior, and thermal dynamics; classical heuristics struggle under workload irregularity, tabular regressors discard structural information, and model-free RL risks overheating resource-constrained devices. We introduce GraphPerf-RT, the first surrogate that unifies task DAG topology, CFG-derived code semantics, and runtime context (per-core DVFS, thermal state, utilization) in a heterogeneous graph representation with typed edges encoding precedence, placement, and contention. Multi-task evidential heads predict makespan, energy, cache and branch misses, and utilization with calibrated uncertainty (Normal-Inverse-Gamma), enabling risk-aware scheduling that filters low-confidence rollouts. We validate GraphPerf-RT on three embedded ARM platforms (Jetson TX2, Jetson Orin NX, RUBIK Pi), achieving R^2 > 0.95 with well-calibrated uncertainty (ECE < 0.05). To demonstrate end-to-end scheduling utility, we integrate the surrogate with four RL methods on Jetson TX2: single-agent model-free (SAMFRL), single-agent model-based (SAMBRL), multi-agent model-free (MAMFRL-D3QN), and multi-agent model-based (MAMBRL-D3QN). Experiments across 5 seeds (200 episodes each) show that MAMBRL-D3QN with GraphPerf-RT as the world model achieves 66% makespan reduction (0.97 +/- 0.35s) and 82% energy reduction (0.006 +/- 0.005J) compared to model-free baselines, demonstrating that accurate, uncertainty-aware surrogates enable effective model-based planning on thermally constrained embedded systems.
Problem

Research questions and friction points this paper is trying to address.

Predicts OpenMP task performance on heterogeneous embedded SoCs
Models task DAG, code semantics, and runtime context in a graph
Enables risk-aware scheduling to reduce makespan and energy consumption
Innovation

Methods, ideas, or system contributions that make the work stand out.

GraphPerf-RT uses heterogeneous graph to model task DAG and runtime context
Multi-task evidential heads predict performance metrics with calibrated uncertainty
Integration with model-based RL enables risk-aware scheduling on embedded platforms