🤖 AI Summary
HPC applications suffer severe performance degradation and low memory utilization due to high-latency remote memory access. To address this, we propose the first data-object-granularity memory disaggregation framework. Our approach enables transparent and efficient memory resource pooling via fine-grained data object identification and migration, a dual-buffer prefetching mechanism guided by access pattern prediction, multi-threaded concurrency control, and a quantified local memory cost model. Key contributions include: (1) the first object-level memory disaggregation management scheme; (2) a lightweight prefetching strategy that mitigates remote latency overhead; and (3) an analytically tractable memory capacity–performance trade-off model. Evaluated across eight representative HPC workloads, our framework reduces local memory footprint by 63% on average while incurring ≤16% end-to-end performance loss—significantly improving memory scalability and overall system resource utilization.
📝 Abstract
Memory disaggregation is promising to scale memory capacity and improves utilization in HPC systems. However, the performance overhead of accessing remote memory poses a significant challenge, particularly for compute-intensive HPC applications where execution times are highly sensitive to data locality. In this work, we present DOLMA, a Data Object Level M emory dis Aggregation framework designed for HPC applications. DOLMA intelligently identifies and offloads data objects to remote memory, while providing quantitative analysis to decide a suitable local memory size. Furthermore, DOLMA leverages the predictable memory access patterns typical in HPC applications and enables remote memory prefetch via a dual-buffer design. By carefully balancing local and remote memory usage and maintaining multi-thread concurrency, DOLMA provides a flexible and efficient solution for leveraging disaggregated memory in HPC domains while minimally compromising application performance. Evaluating with eight HPC workloads and computational kernels, DOLMA limits performance degradation to less than 16% while reducing local memory usage by up to 63%, on average.