🤖 AI Summary
Existing file systems struggle to support lock-free, multi-host concurrent access in CXL shared memory environments. This work proposes the first decentralized Linux file system leveraging CXL 3.0 shared memory, relying solely on cmpxchg atomic operations for cross-host coordination. It introduces a CAS-based hash overlay mechanism to enable concurrent writes and designs an MH-clock algorithm for managing the shared page cache. Integrated with DAX for direct storage access, the system achieves over 99% CAS success rates under high contention without lost updates. On a single-node DRAM backend, it delivers 2.68× higher random write throughput and 1.18× higher random read throughput compared to tmpfs, and demonstrates that GPUs can efficiently access the shared page cache via PCIe 5.0 bandwidth.
📝 Abstract
CXL (Compute Express Link) enables multiple hosts to share byte-addressable memory with hardware cache coherence, but no existing filesystem exploits this for lock-free multi-host coordination. We present DaxFS, a Linux filesystem for CXL shared memory that uses cmpxchg atomic operations, which CXL makes coherent across host boundaries, as its sole coordination primitive. A CAS-based hash overlay enables lock-free concurrent writes from multiple hosts without any centralized coordinator. A cooperative shared page cache with a novel multi-host clock eviction algorithm (MH-clock) provides demand-paged caching in shared DAX memory, with fully decentralized victim selection via cmpxchg. We validate multi-host correctness using QEMU-emulated CXL 3.0, where two virtual hosts share a memory region with TCP-forwarded atomics. Under cross-host contention, DaxFS maintains >99% CAS accuracy with no lost updates. On single-host DRAM-backed DAX, DaxFS exceeds tmpfs throughput across all write workloads, achieving up to 2.68x higher random write throughput with 4 threads and 1.18x higher random read throughput at 64 KB. Preliminary GPU microbenchmarks show that the cmpxchg-based design extends to GPU threads performing page cache operations at PCIe 5.0 bandwidth limits.