Architectural and System Implications of CXL-enabled Tiered Memory

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses system-level challenges introduced by CXL memory—including high latency, limited parallelism, and severe interference with DDR bandwidth (up to 81%)—tracing their root causes to unfair request scheduling, degraded inter-core synchronization, and contention for shared hardware resources. To mitigate these issues, we propose MIKU, a dynamic memory request regulation mechanism. MIKU introduces the first service-time-aware, cooperative priority scheduler for CXL and DDR requests, enabling fair allocation and efficiency-oriented coordination across heterogeneous memory resources. Leveraging microbenchmark-driven analysis, cycle-accurate hardware modeling, and protocol-stack co-optimization, MIKU recovers 98% of peak DDR bandwidth under heavy load, reduces CXL average latency variation by 42%, and significantly improves overall memory efficiency.

Technology Category

Application Category

📝 Abstract
Memory disaggregation is an emerging technology that decouples memory from traditional memory buses, enabling independent scaling of compute and memory. Compute Express Link (CXL), an open-standard interconnect technology, facilitates memory disaggregation by allowing processors to access remote memory through the PCIe bus while preserving the shared-memory programming model. This innovation creates a tiered memory architecture combining local DDR and remote CXL memory with distinct performance characteristics. In this paper, we investigate the architectural implications of CXL memory, focusing on its increased latency and performance heterogeneity, which can undermine the efficiency of existing processor designs optimized for (relatively) uniform memory latency. Using carefully designed micro-benchmarks, we identify bottlenecks such as limited hardware-level parallelism in CXL memory, unfair queuing in memory request handling, and its impact on DDR memory performance and inter-core synchronization. Our findings reveal that the disparity in memory tier parallelism can reduce DDR memory bandwidth by up to 81% under heavy loads. To address these challenges, we propose a Dynamic Memory Request Control mechanism, MIKU, that prioritizes DDR memory requests while serving CXL memory requests on a best-effort basis. By dynamically adjusting CXL request rates based on service time estimates, MIKU achieves near-peak DDR throughput while maintaining high performance for CXL memory. Our evaluation with micro-benchmarks and representative workloads demonstrates the potential of MIKU to enhance tiered memory system efficiency.
Problem

Research questions and friction points this paper is trying to address.

Investigates CXL memory's latency and performance heterogeneity impact
Identifies bottlenecks in CXL memory parallelism and queuing
Proposes MIKU mechanism to optimize tiered memory efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

CXL enables tiered memory via PCIe interconnect
Dynamic Memory Request Control prioritizes DDR access
MIKU adjusts CXL requests for optimal performance
🔎 Similar Papers
No similar papers found.
Y
Yujie Yang
The University of Texas at Arlington
L
Lingfeng Xiang
The University of Texas at Arlington
P
Peiran Du
The University of Texas at Arlington
Z
Zhen Lin
The University of Texas at Arlington
W
Weishu Deng
The University of Texas at Arlington
R
Ren Wang
Intel Labs
Andrey Kudryavtsev
Andrey Kudryavtsev
Micron
L
Louis Ko
Supermicro
Hui Lu
Hui Lu
Department of Computer Science and Engineering (CSE), the University of Texas at Arlington (UTA)
Cloud ComputingVirtualizationFile and Storage SystemsComputer NetworksComputer Systems
Jia Rao
Jia Rao
The University of Texas at Arlington
Cloud computingdistributed systemsmachine learningreinforcement learning