🤖 AI Summary
Kubernetes’ default scheduler employs lightweight heuristic policies, often causing resource fragmentation and failure to schedule high-priority Pods. This paper proposes a constraint programming (CP)-based Pod packing optimization method, implemented using OR-Tools as a pluggable scheduler. It verifies the optimality of the default scheduler’s decisions—and provides fallback optimizations—within strict time budgets (1 s or 10 s). Our key contribution is the first application of CP modeling to Kubernetes scheduling verification and repair, uniquely balancing formal correctness guarantees with real-time feasibility. Experimental evaluation on small-to-medium clusters shows that our approach improves scheduling success rates by over 44% within 1 second and over 73% within 10 seconds, while strictly certifying the optimality of the default scheduler’s solution in more than 19% of cases.
📝 Abstract
Distributed applications employ Kubernetes for scalable, fault-tolerant deployments over computer clusters, where application components run in groups of containers called pods. The scheduler, at the heart of Kubernetes'architecture, determines the placement of pods given their priority and resource requirements on cluster nodes. To quickly allocate pods, the scheduler uses lightweight heuristics that can lead to suboptimal placements and resource fragmentation, preventing allocations of otherwise deployable pods on the available nodes. We propose the usage of constraint programming to find the optimal allocation of pods satisfying all their priorities and resource requests. Implementation-wise, our solution comes as a plug-in to the default scheduler that operates as a fallback mechanism when some pods cannot be allocated. Using the OR-Tools constraint solver, our experiments on small-to-mid-sized clusters indicate that, within a 1-second scheduling window, our approach places more higher-priority pods than the default scheduler (possibly demonstrating allocation optimality) in over 44% of realisable allocation scenarios where the default scheduler fails, while certifying that the default scheduler's placement is already optimal in over 19% of scenarios. With a 10-second window, our approach improves placements in over 73% and still certifies that the default scheduler's placement is already optimal in over 19% of scenarios.