🤖 AI Summary
This work addresses the asymmetric cost of memory allocation in distributed clusters, where under-allocation risks task failures while over-allocation leads to resource waste. To balance these competing concerns, the authors propose a memory provisioning strategy that integrates conditional quantile regression with a multiplicative safety factor. By ensembling LightGBM and XGBoost to predict high-quantile memory demands, the method explicitly trades off the risks of under- and over-allocation and characterizes their Pareto frontier. Evaluated on a real-world SAP build-task dataset, the approach reduces the fraction of memory-insufficient tasks from 4.17% to 2.89% while significantly cutting the average over-allocation rate from 148% to 44.51%.
📝 Abstract
In modern distributed systems, efficient resource allocation is a vital aspect to maintain scalability, reduce operational costs, and ensure fast execution even across heterogeneous workloads. Predictive models for resource usage are essential tools for optimizing allocation and preventing system bottlenecks. Predictive memory allocation has asymmetric costs as a key challenge: underallocation causes failures while overallocation wastes memory.
We propose a regression method based on a LightGBM and XGBoost ensemble trained to predict high conditional quantiles. To further account for the high cost of underallocations we add a multiplicative safety factor. With our method we are able to reduce the number of under-allocated jobs from 4.17% to 2.89% and average overallocation from 148% to 44.51% on a real-world dataset of build jobs provided by SAP. We further explore the pareto frontier between optimization for underallocation and for overallocation.