CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address GPU memory capacity and bandwidth limitations, as well as low integration efficiency of heterogeneous storage (DRAM/SSD), this paper proposes a CXL 3.0–based GPU memory expansion architecture. The method introduces three key innovations: (1) a silicon-verified RTL-level CXL controller enabling unified management of heterogeneous memory across multiple root ports; (2) hardware-coordinated memory semantic extensions and a low-latency interconnect design achieving sub-100 ns round-trip latency; and (3) a speculative read and deterministic write mechanism that effectively masks backend media latency variability. Experimental evaluation demonstrates that, compared to state-of-the-art GPU memory expansion approaches, the proposed architecture achieves significantly higher bandwidth and reduces latency by an order of magnitude. It delivers a high-bandwidth, low-latency, and scalable unified memory space, directly supporting large-model training and high-performance computing workloads.

Technology Category

Application Category

📝 Abstract
This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL controller integrated at the hardware RTL level, achieving two-digit nanosecond roundtrip latency, the first in the field. This study also includes speculative read and deterministic store mechanisms to efficiently manage read and write operations to hide the endpoint's backend media latency variation. Performance evaluations reveal our approach significantly outperforms existing methods, marking a substantial advancement in GPU storage technology.
Problem

Research questions and friction points this paper is trying to address.

Expanding GPU memory using CXL technology
Reducing latency with custom CXL controller
Managing read/write operations efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU storage expansion using CXL technology
Custom CXL controller with nanosecond latency
Speculative read and deterministic store mechanisms
🔎 Similar Papers
No similar papers found.
D
Donghyun Gouk
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
Seungkwan Kang
Seungkwan Kang
Graduate Student of Electrical Engineering (EE), KAIST
Computer Architecture
S
Seungjun Lee
KAIST, Daejeon, South Korea
Jiseon Kim
Jiseon Kim
KAIST
Natural Language ProcessingComputational Social Science
K
Kyungkuk Nam
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
E
Eojin Ryu
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
S
Sangwon Lee
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
D
Dongpyung Kim
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
J
Junhyeok Jang
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
H
Hanyeoreum Bae
Next-Generation Silicon and Research Division, Panmnesia, Inc., Daejeon, South Korea
Myoungsoo Jung
Myoungsoo Jung
The KAIST Endowed Chair Professor | Full Professor, Department of Electrical Engineering, KAIST
Computer ArchitectureSolid State DriveNon-Volatile MemoryCXLOperating Systems