Phoenix - A Novel Technique for Performance-Aware Orchestration of Thread and Page Table Placement in NUMA Systems

📅 2025-02-15

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

In NUMA systems, poor coordination between the CPU scheduler and memory manager leads to thread–page-table placement mismatches. This paper proposes a hardware–software co-design optimization framework to address this issue. Our approach introduces: (1) the first joint thread and page-table placement mechanism; (2) differentiated migration and on-demand copying policies for data pages versus page-table pages; and (3) memory bandwidth throttling coupled with hardware performance counter–driven QoS feedback control to suppress cross-socket coherence overhead. Implemented as a Linux kernel module, the solution requires no application modifications. Experimental evaluation demonstrates that, compared to state-of-the-art approaches, our method reduces CPU execution cycles by 2.09× and page-table traversal overhead by 1.58×, significantly improving NUMA locality and scalability.

Technology Category

Application Category

📝 Abstract

The emergence of symmetric multi-processing (SMP) systems with non-uniform memory access (NUMA) has prompted extensive research on process and data placement to mitigate the performance impact of NUMA on applications. However, existing solutions often overlook the coordination between the CPU scheduler and memory manager, leading to inefficient thread and page table placement. Moreover, replication techniques employed to improve locality suffer from redundant replicas, scalability barriers, and performance degradation due to memory bandwidth and inter-socket interference. In this paper, we present Phoenix, a novel integrated CPU scheduler and memory manager with on-demand page table replication mechanism. Phoenix integrates the CPU scheduler and memory management subsystems, allowing for coordinated thread and page table placement. By differentiating between data and page table pages, Phoenix enables direct migration or replication of page tables based on application behavior. Additionally, Phoenix employs memory bandwidth management mechanism to maintain Quality of Service (QoS) while mitigating coherency maintenance overhead. We implemented Phoenix as a loadable kernel module for Linux, ensuring compatibility with legacy applications and ease of deployment. Our evaluation on real hardware demonstrates that Phoenix reduces CPU cycles by 2.09x and page-walk cycles by 1.58x compared to state-of-the-art solutions.

Problem

Research questions and friction points this paper is trying to address.

Optimizes thread and page table placement in NUMA systems

Reduces redundant replicas and memory bandwidth interference

Enhances performance with integrated CPU and memory management

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated CPU scheduler

On-demand page table replication

Memory bandwidth management

🔎 Similar Papers

No similar papers found.