The Merit of Simple Policies: Buying Performance With Parallelism and System Architecture

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the joint optimization of server count, scheduling policy, and system architecture under a fixed computational budget to minimize average job response time. Using high-resolution traces from Google Cloud production workloads, we develop a multi-stage server cluster model and systematically compare classical policies—including Join-Idle-Queue (JIQ) and Round-Robin (RR)—against state-of-the-art size-aware schedulers. Our findings reveal: (1) an optimal critical server scale that minimizes response time; (2) in high-parallelism or multi-tier architectures, RR and JIQ significantly outperform conventional size-aware policies; and (3) parallelism degree and architectural design exert greater influence on performance than scheduling algorithm sophistication. Collectively, these results establish a new optimization paradigm wherein “architecture–parallelism” dominates over “algorithmic refinement.”

Technology Category

Application Category

📝 Abstract
While scheduling and dispatching of computational workloads is a well-investigated subject, only recently has Google provided publicly a vast high-resolution measurement dataset of its cloud workloads. We revisit dispatching and scheduling algorithms fed by traffic workloads derived from those measurements. The main finding is that mean job response time attains a minimum as the number of servers of the computing cluster is varied, under the constraint that the overall computational budget is kept constant. Moreover, simple policies, such as Join Idle Queue, appear to attain the same performance as more complex, size-based policies for suitably high degrees of parallelism. Further, better performance, definitely outperforming size-based dispatching policies, is obtained by using multi-stage server clusters, even using very simple policies such as Round Robin. The takeaway is that parallelism and architecture of computing systems might be powerful knobs to control performance, even more than policies, under realistic workload traffic.
Problem

Research questions and friction points this paper is trying to address.

Optimizing job response time in cloud computing clusters.
Comparing simple vs. complex dispatching policies for workload scheduling.
Exploring the impact of parallelism and system architecture on performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes high-resolution cloud workload measurements
Employs simple dispatching policies like Join Idle Queue
Implements multi-stage server clusters for enhanced performance
🔎 Similar Papers
No similar papers found.
M
Mert Yildiz
Dept. of Information Engineering, Electronics and Telecommunications (DIET), University of Rome Sapienza, Italy
A
Alexey Rolich
Dept. of Information Engineering, Electronics and Telecommunications (DIET), University of Rome Sapienza, Italy
Andrea Baiocchi
Andrea Baiocchi
University of Roma Sapienza - DIET
Networkingnetwork traffic engineeringperformance evaluation