EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In resource-decoupled HPC systems, remote memory access incurs substantial latency, necessitating efficient quantification of application sensitivity to memory latency and memory-level parallelism (MLP). Existing approaches rely on custom hardware or cycle-accurate simulation, suffering from poor flexibility and high overhead. This paper introduces the first lightweight, runtime instruction-trace-based framework that constructs an execution directed acyclic graph (DAG) and—uniquely—integrates DAG critical-path analysis with memory access pattern modeling to theoretically bound latency sensitivity and MLP. The framework is portable across diverse hardware configurations. Evaluated on PolyBench, HPCG, and LULESH, it achieves prediction errors under 8% for performance bounds while accelerating analysis by three orders of magnitude compared to cycle-accurate simulation.

Technology Category

Application Category

📝 Abstract
Resource disaggregation is a promising technique for improving the efficiency of large-scale computing systems. However, this comes at the cost of increased memory access latency due to the need to rely on the network fabric to transfer data between remote nodes. As such, it is crucial to ascertain an application's memory latency sensitivity to minimize the overall performance impact. Existing tools for measuring memory latency sensitivity often rely on custom ad-hoc hardware or cycle-accurate simulators, which can be inflexible and time-consuming. To address this, we present EDAN (Execution DAG Analyzer), a novel performance analysis tool that leverages an application's runtime instruction trace to generate its corresponding execution DAG. This approach allows us to estimate the latency sensitivity of sequential programs and investigate the impact of different hardware configurations. EDAN not only provides us with the capability of calculating the theoretical bounds for performance metrics, but it also helps us gain insight into the memory-level parallelism inherent to HPC applications. We apply EDAN to applications and benchmarks such as PolyBench, HPCG, and LULESH to unveil the characteristics of their intrinsic memory-level parallelism and latency sensitivity.
Problem

Research questions and friction points this paper is trying to address.

Analyzes memory latency sensitivity in disaggregated HPC systems
Estimates performance bounds and memory parallelism from instruction traces
Evaluates applications like PolyBench and HPCG for latency impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes runtime instruction traces for execution DAG
Estimates latency sensitivity of sequential programs
Investigates memory-level parallelism in HPC applications
🔎 Similar Papers
No similar papers found.
Siyuan Shen
Siyuan Shen
School of Information Science and Technology, ShanghaiTech University
Computer visionComputational photography
M
Mikhail Khalilov
ETH Zürich, Switzerland
L
Lukas Gianinazzi
ETH Zürich, Switzerland
Timo Schneider
Timo Schneider
Karlsruhe Institute of Technology
Computer Vision
M
Marcin Chrapek
ETH Zürich, Switzerland
J
Jai Dayal
Cerebras Systems, USA
M
Manisha Gajbe
Not Affiliated, USA
R
Robert Wisniewski
Hewlett Packard Enterprise, USA
Torsten Hoefler
Torsten Hoefler
Professor of Computer Science at ETH Zurich
High Performance ComputingDeep LearningNetworkingMessage Passing InterfaceParallel and Distributed Computing