Improving Nonpreemptive Multiserver Job Scheduling with Quickswap

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This paper addresses the non-preemptive scheduling of stateful jobs across multiple servers in data centers—a problem characterized by the need to avoid costly state save/restore operations while mitigating poor resource utilization under FCFS and high waiting-time variability under Minimum-Service-First (MSF), both of which degrade response time. To this end, we propose MSF-QuickSwap (MSFQ), a novel scheduling strategy featuring a periodic priority-swapping mechanism that dynamically balances job priorities without compromising system utilization, thereby significantly reducing waiting-time variance. We theoretically prove that MSFQ strictly dominates MSF for both single-core and full-core job requests. Extensive simulations under realistic workloads show that MSFQ reduces average response time by 27.4% over MSF and by 39.1% over FCFS, while simultaneously improving stability, fairness, and overall performance.

Technology Category

Application Category

📝 Abstract

Modern data center workloads are composed of multiserver jobs, computational jobs that require multiple CPU cores in order to run. A data center server can run many multiserver jobs in parallel, as long as it has sufficient resources to meet their demands. However, multiserver jobs are generally stateful, meaning that job preemptions incur significant overhead from saving and reloading the state associated with running jobs. Hence, most systems try to avoid these costly job preemptions altogether. Given these constraints, a scheduling policy must determine what set of jobs to run in parallel at each moment in time to minimize the mean response time across a stream of arriving jobs. Unfortunately, simple non-preemptive policies such as FCFS may leave many cores idle, resulting in high mean response times or even system instability. Our goal is to design and analyze non-preemptive scheduling policies for multiserver jobs that maintain high system utilization to achieve low mean response time. One well-known non-preemptive policy, Most Servers First (MSF), prioritizes jobs with higher core requirements and achieves high resource utilization. However, MSF causes extreme variability in job waiting times, and can perform significantly worse than FCFS in practice. To address this, we propose and analyze a class of scheduling policies called MSF-Quick Swap (MSFQ) that performs well. MSFQ reduces the variability of job waiting times by periodically granting priority to other jobs in the system. We provide both stability results and an analysis of mean response time under MSFQ to prove that our policy dramatically outperforms MSF in the case where jobs request one core or all the cores. In more complex cases, we evaluate MSFQ in simulation. We show that, with some additional optimization, variants of the MSFQ policy can greatly outperform MSF and FCFS on real-world multiserver job workloads.

Problem

Research questions and friction points this paper is trying to address.

Minimizing response time for nonpreemptive multiserver job scheduling

Avoiding costly job preemptions while maintaining high system utilization

Reducing job waiting time variability in multiserver job scheduling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes MSF-QuickSwap scheduling policy

Reduces job waiting time variability

Maintains high system utilization non-preemptively

🔎 Similar Papers

Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions