🤖 AI Summary
Deployment of Compute Express Link (CXL) memory in high-performance computing (HPC) and large language model (LLM) workloads remains challenging due to insufficient understanding of its performance boundaries and memory hierarchy coordination. Method: This work conducts a systematic evaluation across three real-world CXL expansion cards, integrating CXL protocol analysis, multi-level memory topology simulation and measurement, LLM inference memory footprint modeling, and HPC application profiling. We propose the first data-object–level dynamic memory interleaving strategy, enabling real-time alignment between interleaving granularity and fine-grained access patterns. Contribution/Results: Quantitative characterization reveals significant inter-vendor differences in bandwidth and latency. Experiments show CXL memory improves LLM model loading throughput by 2.3×; our object-level interleaving reduces average memory access latency by 19% versus conventional page-level interleaving. The study identifies critical bottlenecks—and corresponding optimization avenues—for scalable CXL deployment in HPC and AI workloads.
📝 Abstract
Compute eXpress Link (CXL) is emerging as a promising memory interface technology. Because of the common unavailability of CXL devices, the performance of the CXL memory is largely unknown. What are the use cases for the CXL memory? What are the impacts of the CXL memory on application performance? How to use the CXL memory in combination with existing memory components? In this work, we study the performance of three genuine CXL memory-expansion cards from different vendors. We characterize the basic performance of the CXL memory, study how HPC applications and large language models can benefit from the CXL memory, and study the interplay between memory tiering and page interleaving. We also propose a novel data object-level interleaving policy to match the interleaving policy with memory access patterns. We reveal the challenges and opportunities of using the CXL memory.