🤖 AI Summary
This paper investigates the joint optimization of server count, scheduling policy, and system architecture under a fixed computational budget to minimize average job response time. Using high-resolution traces from Google Cloud production workloads, we develop a multi-stage server cluster model and systematically compare classical policies—including Join-Idle-Queue (JIQ) and Round-Robin (RR)—against state-of-the-art size-aware schedulers. Our findings reveal: (1) an optimal critical server scale that minimizes response time; (2) in high-parallelism or multi-tier architectures, RR and JIQ significantly outperform conventional size-aware policies; and (3) parallelism degree and architectural design exert greater influence on performance than scheduling algorithm sophistication. Collectively, these results establish a new optimization paradigm wherein “architecture–parallelism” dominates over “algorithmic refinement.”
📝 Abstract
While scheduling and dispatching of computational workloads is a well-investigated subject, only recently has Google provided publicly a vast high-resolution measurement dataset of its cloud workloads. We revisit dispatching and scheduling algorithms fed by traffic workloads derived from those measurements. The main finding is that mean job response time attains a minimum as the number of servers of the computing cluster is varied, under the constraint that the overall computational budget is kept constant. Moreover, simple policies, such as Join Idle Queue, appear to attain the same performance as more complex, size-based policies for suitably high degrees of parallelism. Further, better performance, definitely outperforming size-based dispatching policies, is obtained by using multi-stage server clusters, even using very simple policies such as Round Robin. The takeaway is that parallelism and architecture of computing systems might be powerful knobs to control performance, even more than policies, under realistic workload traffic.