EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

In resource-decoupled HPC systems, remote memory access incurs substantial latency, necessitating efficient quantification of application sensitivity to memory latency and memory-level parallelism (MLP). Existing approaches rely on custom hardware or cycle-accurate simulation, suffering from poor flexibility and high overhead. This paper introduces the first lightweight, runtime instruction-trace-based framework that constructs an execution directed acyclic graph (DAG) and—uniquely—integrates DAG critical-path analysis with memory access pattern modeling to theoretically bound latency sensitivity and MLP. The framework is portable across diverse hardware configurations. Evaluated on PolyBench, HPCG, and LULESH, it achieves prediction errors under 8% for performance bounds while accelerating analysis by three orders of magnitude compared to cycle-accurate simulation.

Technology Category

Application Category

📝 Abstract

Resource disaggregation is a promising technique for improving the efficiency of large-scale computing systems. However, this comes at the cost of increased memory access latency due to the need to rely on the network fabric to transfer data between remote nodes. As such, it is crucial to ascertain an application's memory latency sensitivity to minimize the overall performance impact. Existing tools for measuring memory latency sensitivity often rely on custom ad-hoc hardware or cycle-accurate simulators, which can be inflexible and time-consuming. To address this, we present EDAN (Execution DAG Analyzer), a novel performance analysis tool that leverages an application's runtime instruction trace to generate its corresponding execution DAG. This approach allows us to estimate the latency sensitivity of sequential programs and investigate the impact of different hardware configurations. EDAN not only provides us with the capability of calculating the theoretical bounds for performance metrics, but it also helps us gain insight into the memory-level parallelism inherent to HPC applications. We apply EDAN to applications and benchmarks such as PolyBench, HPCG, and LULESH to unveil the characteristics of their intrinsic memory-level parallelism and latency sensitivity.

Problem

Research questions and friction points this paper is trying to address.

Analyzes memory latency sensitivity in disaggregated HPC systems

Estimates performance bounds and memory parallelism from instruction traces

Evaluates applications like PolyBench and HPCG for latency impact

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes runtime instruction traces for execution DAG

Estimates latency sensitivity of sequential programs

Investigates memory-level parallelism in HPC applications

🔎 Similar Papers

Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization

2024-06-07International Symposium on High-Performance Computer ArchitectureCitations: 5

Nvidia

184,000 USD - 287,500 USD

US, CA, Santa Clara / US, TX, Austin / US, OR, Hillsboro

AI/HPC System Performance Engineer