๐ค AI Summary
This work addresses the high cost and low utilization of DRAM in large-scale AI training and inference, which necessitate efficient memory disaggregation solutionsโyet existing systems lack support for the Compute Express Link (CXL) architecture. To bridge this gap, we propose CXL-ClusterSim, the first full-system simulation framework that integrates gem5โs high-fidelity processor modeling with SSTโs parallel simulation capabilities to accurately model memory pooling and sharing mechanisms under the CXL protocol. This platform enables scalable, flexible, and efficient simulation of CXL-based disaggregated memory clusters, significantly accelerating design space exploration for novel architectures and providing a critical tool for hardware-software co-design and computer architecture research.
๐ Abstract
Large-scale AI training and inference require hundreds of gigabytes to terabytes of DRAM with high peak to average utilization ratios, resulting in overprovisioning. In cloud computing, DRAM constitutes a significant share of the cost. Yet, as shown by recent articles, DRAM is heavily under utilized. Memory disaggregation is a solution to both these problems. With the advent of the CXL protocol, there is renewed interest in designing and optimizing computing systems with disaggregated memory. However, at present, there are limited simulation tools available for exploring the design space and evaluating the performance tradeoffs in computer systems with disaggregated memory.
In this paper, we propose CXL-ClusterSim, a full-system modeling and simulation framework by combining the gem5 simulator for fidelity, with the Structural Simulation Toolkit (SST) for parallel simulation. We outline the challenges in creating this simulation infrastructure and present a design that is scalable, flexible, and reasonably fast to help computer architects to explore the design space of CXL-based disaggregated memory and identify new opportunities for hardware/software codesign and performance optimization.