Mitigating context switching in densely packed Linux clusters with Latency-Aware Group Scheduling

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

In high-density Linux clusters, frequent CPU context switches cause significant performance degradation; even with optimal scheduler placement policies, excessive resource over-provisioning is commonly relied upon for mitigation—leading to substantial waste. This paper proposes a latency-aware group scheduling optimization: departing from traditional per-task fairness prioritization, it instead uses task completion latency as the primary scheduling objective. Leveraging dynamic cgroup workload characterization, it adaptively regulates runqueues and deeply modifies the Linux kernel scheduler to enable fine-grained, low-overhead group-level scheduling. Experimental evaluation demonstrates that, while strictly satisfying service-level agreement (SLA) constraints, the approach reduces cluster resource requirements by 28%, markedly improving resource utilization and overall system throughput.

Technology Category

Application Category

📝 Abstract

Cluster orchestrators such as Kubernetes depend on accurate estimates of node capacity and job requirements. Inaccuracies in either lead to poor placement decisions and degraded cluster performance. In this paper, we show that in densely packed workloads, such as serverless applications, CPU context switching overheads can become so significant that a node's performance is severely degraded, even when the orchestrator placement is theoretically sound. In practice this issue is typically mitigated by over-provisioning the cluster, leading to wasted resources. We show that these context switching overhead arise from both an increase in the average cost of an individual context switch and a higher rate of context switching, which together amplify overhead multiplicatively when managing large numbers of concurrent cgroups, Linux's group scheduling mechanism for managing multi-threaded colocated workloads. We propose and evaluate modifications to the standard Linux kernel scheduler that mitigate these effects, achieving the same effective performance with a 28% smaller cluster size. The key insight behind our approach is to prioritise task completion over low-level per-task fairness, enabling the scheduler to drain contended CPU run queues more rapidly and thereby reduce time spent on context switching.

Problem

Research questions and friction points this paper is trying to address.

Reducing CPU context switching overhead in densely packed Linux clusters

Addressing performance degradation from concurrent cgroups in serverless workloads

Improving scheduler efficiency to reduce cluster over-provisioning requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latency-Aware Group Scheduling for Linux clusters

Modifies kernel scheduler to prioritize task completion

Reduces context switching overheads in dense workloads

🔎 Similar Papers

Efficient Direct-Connect Topologies for Collective Communications