🤖 AI Summary
To address the high cost, limited capacity, and volatility of DRAM, this work presents the first system-level performance characterization and application adaptation study of Samsung’s CXL memory module hybrid prototype (CMM-H)—integrating DRAM and NAND flash—on an FPGA platform, targeting data-intensive workloads such as AI/ML and HPC. We propose a memory-semantics-based deployment scheme for NAND-backed memory, bypassing traditional block-device I/O stack overheads, and develop a multi-tiered microbenchmark and real-application evaluation framework. Experimental results show that CMM-H achieves over 90% of DRAM performance across mainstream AI, HPC, and database workloads, with average memory access latency under 1.5 μs. We further delineate its applicability boundaries and identify key tuning strategies for optimal deployment. This study provides the first comprehensive empirical validation and practical engineering guidance for deploying CXL-based hybrid memory systems.
📝 Abstract
The growing prevalence of data-intensive workloads, such as artificial intelligence (AI), machine learning (ML), high-performance computing (HPC), in-memory databases, and real-time analytics, has exposed limitations in conventional memory technologies like DRAM. While DRAM offers low latency and high throughput, it is constrained by high costs, scalability challenges, and volatility, making it less viable for capacity-bound and persistent applications in modern datacenters. Recently, Compute Express Link (CXL) has emerged as a promising alternative, enabling high-speed, cacheline-granular communication between CPUs and external devices. By leveraging CXL technology, NAND flash can now be used as memory expansion, offering three-fold benefits: byte-addressability, scalable capacity, and persistence at a low cost. Samsung's CXL Memory Module Hybrid (CMM-H) is the first product to deliver these benefits through a hardware-only solution, i.e., it does not incur any OS and IO overheads like conventional block devices. In particular, CMM-H integrates a DRAM cache with NAND flash in a single device to deliver near-DRAM latency. This paper presents the first publicly available study for comprehensive characterizations of an FPGA-based CMM-H prototype. Through this study, we address users' concerns about whether a wide variety of applications can successfully run on a memory device backed by NAND flash medium. Additionally, based on these characterizations, we provide key insights into how to best take advantage of the CMM-H device.