π€ AI Summary
Physical constraints fundamentally limit the scalability of CXL memory expanders in terms of capacity and channel bandwidth, hindering efficient increases in effective memory capacity. To address this challenge, this work proposes IBEX, a block-level compression architecture tailored for modern memory expanders. IBEX introduces an innovative demand-driven compression strategy that distinguishes between hot and cold data, compressing only cold pages to preserve the access performance of hot data. It incorporates a low-overhead cold-page identification mechanism and a shadow promotion scheme to mitigate decompression latency. Furthermore, IBEX enhances internal bandwidth utilization through metadata compression and a multi-block cooperative layout optimization. Experimental results demonstrate that IBEX achieves an average performance speedup of 1.28Γ to 1.40Γ over the state-of-the-art block-level compression approaches.
π Abstract
As the memory channel count is confined by physical dimensions, memory expanders appear to be a promising approach to extending memory capacity and channels by augmenting the existing I/O interface (e.g., PCIe) with memory-semantic protocols like CXL. Unfortunately, the physical constraints of a computing system restrict scalable capacity expansion with memory expanders. In this work, we propose a block-level compression scheme for modern memory expanders, IBEX, to achieve larger effective memory capacity. Given the performance overhead associated with block-level compression algorithms (e.g., LZ77), IBEX employs a promotion-based approach: only cold data is compressed, whereas hot data remains uncompressed. Our key innovation is internal bandwidth-efficient block management that precisely identifies cold pages with minimal metadata access overhead. Still, the promotion-based approach poses several performance-related challenges at the design level. Therefore, we also propose a shadowed promotion scheme that temporarily postpones the deallocation of promoted data, thereby mitigating the performance penalty incurred by demotion (i.e., recompression). Furthermore, we optimize our compression scheme by compacting metadata and co-locating multiple target blocks for efficient bandwidth utilization. Consequently, IBEX achieves an average of 1.28x-1.40x speedups compared to the state-of-the-art promotion-based block-level approaches. We open-source IBEX at https://github.com/relacslab/ibex-ics26.