π€ AI Summary
To address the lack of efficient, automated offloading decision support for Compute-Near-Memory (CNM) systems during system-level design, this paper proposes CoMoNMβthe first hardware-agnostic, millisecond-scale end-to-end execution time cost modeling framework tailored for CNM. CoMoNM employs analytical modeling based on high-level program representations, target system specifications, and memory-mapping strategies, enabling high-accuracy performance prediction without time-consuming cycle-accurate simulation. Its modeling pipeline integrates seamlessly into mainstream CNM compilers, significantly accelerating offloading optimization. Evaluated on a real UPMEM DPU platform and the Samsung HBM-PIM simulator, CoMoNM achieves mean absolute prediction errors of only 7.80% and 2.99%, respectively, while delivering speedups of seven orders of magnitude over state-of-the-art simulators.
π Abstract
Compute-Near-Memory (CNM) systems offer a promising approach to mitigate the von Neumann bottleneck by bringing computational units closer to data. However, optimizing for these architectures remains challenging due to their unique hardware and programming models. Existing CNM compilers often rely on manual programmer annotations for offloading and optimizations. Automating these decisions by exploring the optimization space, common in CPU/GPU systems, is difficult for CNMs as constructing and navigating the transformation space is tedious and time consuming. This is particularly the case during system-level design, where evaluation requires time-consuming simulations. To address this, we present CoMoNM, a generic cost modeling framework for CNM systems for execution time estimation in milliseconds. It takes a high-level, hardware-agnostic application representation, target system specifications, and a mapping specification as input and estimates the execution time for the given application on the target CNM system. We show how CoMoNM can be seamlessly integrated into state-of-the-art CNM compilers, providing improved offloading decisions. Evaluation on established benchmarks for CNM shows estimation errors within 7.80% and 2.99%, when compared to the real UPMEM CNM system and Samsung's HBM-PIM simulator. Notably, CoMoNM delivers estimates seven orders of magnitude faster compared to the UPMEM and HBM-PIM simulators.