CoMoNM: A Cost Modeling Framework for Compute-Near-Memory Systems

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

career value

271K/year

🤖 AI Summary

To address the lack of efficient, automated offloading decision support for Compute-Near-Memory (CNM) systems during system-level design, this paper proposes CoMoNM—the first hardware-agnostic, millisecond-scale end-to-end execution time cost modeling framework tailored for CNM. CoMoNM employs analytical modeling based on high-level program representations, target system specifications, and memory-mapping strategies, enabling high-accuracy performance prediction without time-consuming cycle-accurate simulation. Its modeling pipeline integrates seamlessly into mainstream CNM compilers, significantly accelerating offloading optimization. Evaluated on a real UPMEM DPU platform and the Samsung HBM-PIM simulator, CoMoNM achieves mean absolute prediction errors of only 7.80% and 2.99%, respectively, while delivering speedups of seven orders of magnitude over state-of-the-art simulators.

Technology Category

Application Category

📝 Abstract

Compute-Near-Memory (CNM) systems offer a promising approach to mitigate the von Neumann bottleneck by bringing computational units closer to data. However, optimizing for these architectures remains challenging due to their unique hardware and programming models. Existing CNM compilers often rely on manual programmer annotations for offloading and optimizations. Automating these decisions by exploring the optimization space, common in CPU/GPU systems, is difficult for CNMs as constructing and navigating the transformation space is tedious and time consuming. This is particularly the case during system-level design, where evaluation requires time-consuming simulations. To address this, we present CoMoNM, a generic cost modeling framework for CNM systems for execution time estimation in milliseconds. It takes a high-level, hardware-agnostic application representation, target system specifications, and a mapping specification as input and estimates the execution time for the given application on the target CNM system. We show how CoMoNM can be seamlessly integrated into state-of-the-art CNM compilers, providing improved offloading decisions. Evaluation on established benchmarks for CNM shows estimation errors within 7.80% and 2.99%, when compared to the real UPMEM CNM system and Samsung's HBM-PIM simulator. Notably, CoMoNM delivers estimates seven orders of magnitude faster compared to the UPMEM and HBM-PIM simulators.

Problem

Research questions and friction points this paper is trying to address.

Optimizing Compute-Near-Memory systems is challenging due to unique hardware models

Existing CNM compilers rely on manual annotations for offloading decisions

System-level design requires time-consuming simulations for performance evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated cost modeling for CNM systems

Hardware-agnostic application representation

Millisecond-level execution time estimation

🔎 Similar Papers

Efficient Heterogeneous Large Language Model Decoding with Model-Attention Disaggregation