🤖 AI Summary
To address NUMA mismatch and performance degradation caused by suboptimal virtual resource mapping in multi-level NUMA-decoupled systems, this paper proposes a NUMA-aware dynamic resource scheduling framework. Methodologically, it introduces the first multi-level NUMA co-mapping algorithm tailored for decoupled architectures, integrating CPU pinning, cross-node memory migration, application-characteristic–aware modeling, and lightweight runtime monitoring—implemented via extensions to the Linux kernel scheduler. Unlike conventional single-node NUMA schedulers, our framework overcomes hierarchical isolation by jointly optimizing performance, resource contention, and utilization across NUMA levels. Evaluated on a six-node decoupled system (288 cores, 1 TB memory), it achieves 37–62% performance improvement for real-world workloads, including graph database applications.
📝 Abstract
Disaggregated systems have a novel architecture motivated by the requirements of resource intensive applications such as social networking, search, and in-memory databases. The total amount of resources such as memory and CPU cores is very large in such systems. However, the distributed topology of disaggregated server systems result in non-uniform access latency and performance, with both NUMA aspects inside each box, as well as additional access latency for remote resources. In this work, we study the effects complex NUMA topologies on application performance and propose a method for improved, NUMA-aware, mapping for virtualized environments running on disaggregated systems. Our mapping algorithm is based on pinning of virtual cores and/or migration of memory across a disaggregated system and takes into account application performance, resource contention, and utilization. The proposed method is evaluated on a 288 cores and around 1TB memory system, composed of six disaggregated commodity servers, through a combination of benchmarks and real applications such as memory intensive graph databases. Our evaluation demonstrates significant improvement over the vanilla resource mapping methods. Overall, the mapping algorithm is able to improve performance by significant magnitude compared the default Linux scheduler used in system.