CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

To address GPU memory capacity and bandwidth limitations, as well as low integration efficiency of heterogeneous storage (DRAM/SSD), this paper proposes a CXL 3.0–based GPU memory expansion architecture. The method introduces three key innovations: (1) a silicon-verified RTL-level CXL controller enabling unified management of heterogeneous memory across multiple root ports; (2) hardware-coordinated memory semantic extensions and a low-latency interconnect design achieving sub-100 ns round-trip latency; and (3) a speculative read and deterministic write mechanism that effectively masks backend media latency variability. Experimental evaluation demonstrates that, compared to state-of-the-art GPU memory expansion approaches, the proposed architecture achieves significantly higher bandwidth and reduces latency by an order of magnitude. It delivers a high-bandwidth, low-latency, and scalable unified memory space, directly supporting large-model training and high-performance computing workloads.

Technology Category

Application Category

📝 Abstract

This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL controller integrated at the hardware RTL level, achieving two-digit nanosecond roundtrip latency, the first in the field. This study also includes speculative read and deterministic store mechanisms to efficiently manage read and write operations to hide the endpoint's backend media latency variation. Performance evaluations reveal our approach significantly outperforms existing methods, marking a substantial advancement in GPU storage technology.

Problem

Research questions and friction points this paper is trying to address.

Expanding GPU memory using CXL technology

Reducing latency with custom CXL controller

Managing read/write operations efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU storage expansion using CXL technology

Custom CXL controller with nanosecond latency

Speculative read and deterministic store mechanisms

🔎 Similar Papers

Exploring and Evaluating Real-world CXL: Use Cases and System Adoption