🤖 AI Summary
Existing full-system simulators either oversimplify or fail to comply with CXL memory expansion architectures, making it difficult to accurately evaluate their performance in large language model (LLM) training and inference. This work proposes the first high-fidelity, full-system simulator integrated into gem5 that accurately models CXL devices at I/O-bus-compliant locations, enabling native Linux kernel execution and supporting a complete software stack without requiring application modifications. The simulator is the first to faithfully model the CXL.mem protocol and physical interconnects at the full-system level and implements interleaved accesses between DRAM and CXL-attached memory. It realistically captures critical performance challenges such as cache pollution induced by CXL memory accesses, thereby providing a reliable foundation for evaluating and optimizing CXL-based system designs.
📝 Abstract
The growing demands in the training and inference of Large Language Models (LLMs) are accelerating the adoption of scale-up systems that extend server shared memory through the use of Compute Express Link (CXL)-based load/store interconnects. Accurate full-system simulation of such architectures remains challenging, as existing tools (all very recent) rely on simplified or non-compliant architectural models, impacting accuracy and usability. We present CXLRAMSim, the first gem5-integrated, full-system simulator that models CXL devices at their correct position on the I/O bus, enabling the use of unmodified Linux kernels and software stack, realistic latency-bandwidth behavior and true interleaving with system DRAM. Our approach provides high-fidelity CXL.mem characterization and captures key challenges such as cache pollution when accessing CXL memory.