🤖 AI Summary
In hardware memory disaggregation (HMD) systems, remote memory access frequently induces network congestion, drastically increasing page migration overhead and degrading application performance. Existing migration policies ignore the dynamic variation in migration costs caused by network contention, thus failing to achieve optimal memory placement. This paper proposes the first network-aware page migration framework: it integrates page-level fine-grained telemetry, lightweight real-time network state monitoring, and an online reinforcement learning scheduler into the Linux kernel—explicitly modeling and adapting to heterogeneous migration costs induced by network dynamics. Evaluated on a real HMD prototype, our framework improves application performance by 50–70% over the state-of-the-art strategy while reducing remote page migration traffic by up to 2×.
📝 Abstract
Hardware memory disaggregation (HMD) is an emerging technology that enables access to remote memory, thereby creating expansive memory pools and reducing memory underutilization in datacenters. However, a significant challenge arises when accessing remote memory over a network: increased contention that can lead to severe application performance degradation. To reduce the performance penalty of using remote memory, the operating system uses page migration to promote frequently accessed pages closer to the processor. However, previously proposed page migration mechanisms do not achieve the best performance in HMD systems because of obliviousness to variable page transfer costs that occur due to network contention. To address these limitations, we present INDIGO: a network-aware page migration framework that uses novel page telemetry and a learning-based approach for network adaptation. We implemented INDIGO in the Linux kernel and evaluated it with common cloud and HPC applications on a real disaggregated memory system prototype. Our evaluation shows that INDIGO offers up to 50-70% improvement in application performance compared to other state-of-the-art page migration policies and reduces network traffic up to 2x.