DPC: A Distributed Page Cache over CXL

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work addresses the inefficiencies of traditional distributed file systems, where independent node-level caching leads to DRAM underutilization and high consistency overhead. To overcome these limitations, the authors propose and implement a CXL 3.0–based distributed page cache that aggregates cluster memory into a unified cache pool. The system preserves standard file system interfaces while ensuring single-copy page-level consistency—maintaining only one primary copy per page—and leverages CXL’s remote memory semantics for efficient data sharing. Experimental evaluation demonstrates significant performance improvements, achieving up to a 12.4× speedup and a 5.6× geometric mean speedup across real-world and representative data-sharing workloads, substantially reducing both redundancy and consistency overhead.

Technology Category

Application Category

📝 Abstract

Modern distributed file systems rely on uncoordinated, per node page caches that replicate hot data locally across the cluster. While ensuring fast local access, this architecture underutilizes aggregate cluster DRAM capacity through massive data redundancy and incurs prohibitive coherence overhead via heavyweight, lock-based protocols. In this paper, we focus on the design of a distributed page cache that treats the entire cluster's main memory as a single cache budget while preserving standard file-system interfaces and semantics. We present Distributed Page Cache (DPC), an OS-level, distributed page cache built on top of Compute Express Link (CXL) 3.0 memory semantics. DPC enforces a single-copy invariant at page granularity: each file page has exactly one owner node holding the sole resident DRAM copy, and other nodes access it via CXL-based remote mappings rather than creating replicas of the page. DPC is implemented end-to-end on a CXL-based emulation framework that models multi-host CXL 3.0 memory fabrics, enabling detailed evaluation in the absence of widespread hardware. Across real-world and representative data-sharing workloads, DPC delivers speedups of up to 12.4X, with a geometric-mean speedup of 5.6X.

Problem

Research questions and friction points this paper is trying to address.

distributed page cache

CXL

memory coherence

data redundancy

cluster DRAM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed Page Cache

CXL 3.0

Single-Copy Invariant