CXLMemSim: A pure software simulated CXL.mem for performance characterization

📅 2023-03-10
🏛️ arXiv.org
📈 Citations: 13
Influential: 1
📄 PDF
🤖 AI Summary
With CXL.mem hardware not yet commercially available, realistic evaluation of CXL-based memory systems remains infeasible. Method: This paper proposes a lightweight, pure-software simulator that addresses this gap by introducing a novel performance-monitoring-event (PME)-driven, epoch-level execution-time inference mechanism—requiring no source-code modification or full-system simulation—and supporting dynamic configuration of multi-tier memory hierarchies and access latencies. Contribution/Results: The simulator is compatible with mainstream x86 platforms and incurs only 4.41× average runtime overhead on real applications—significantly outperforming alternatives like Gem5. It accurately models CXL.mem’s high latency, low bandwidth, and heterogeneous memory characteristics. Already deployed to support system-level innovations such as memory scheduling, the simulator provides an efficient, scalable, and practical evaluation infrastructure for early-stage CXL ecosystem exploration.
📝 Abstract
The emerging CXL.mem standard provides a new type of byte-addressable remote memory with a variety of memory types and hierarchies. With CXL.mem, multiple layers of memory -- e.g., local DRAM and CXL-attached remote memory at different locations -- are exposed to operating systems and user applications, bringing new challenges and research opportunities. Unfortunately, since CXL.mem devices are not commercially available, it is difficult for researchers to conduct systems research that uses CXL.mem. In this paper, we present our ongoing work, CXLMemSim, a fast and lightweight CXL.mem simulator for performance characterization. CXLMemSim uses a performance model driven using performance monitoring events, which are supported by most commodity processors. Specifically, CXLMemSim attaches to an existing, unmodified program, and divides the execution of the program into multiple epochs; once an epoch finishes, CXLMemSim collects performance monitoring events and calculates the simulated execution time of the epoch based on these events. Through this method, CXLMemSim avoids the performance overhead of a full-system simulator (e.g., Gem5) and allows the memory hierarchy and latency to be easily adjusted, enabling research such as memory scheduling for complex applications. Our preliminary evaluation shows that CXLMemSim slows down the execution of the attached program by 4.41x on average for real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Simulate CXL.mem performance without real hardware
Evaluate memory pooling and disaggregation in software
Analyze latency and bandwidth effects of CXL.mem
Innovation

Methods, ideas, or system contributions that make the work stand out.

Software-based CXL.mem simulation framework
Traces memory accesses with kernel probes
Emulates CXL latency/bandwidth with low overhead
🔎 Similar Papers
No similar papers found.