🤖 AI Summary
This work addresses the challenge of memory overload queries in cloud data warehouses, which can lead to severe resource waste and service disruptions. Existing admission control methods suffer from limited accuracy, poor interpretability, and inadequate adaptability. To overcome these limitations, we propose SafeLoad, a novel framework that first filters safe queries using interpretable rules and then accurately identifies overload queries through a hybrid model combining global and cluster-level predictors, augmented with a misclassification correction module. SafeLoad further incorporates a self-tuning quota mechanism to dynamically optimize prediction performance across clusters. Our contributions include the first admission control framework tailored for memory overload queries, SafeBench—an open-source industrial benchmark comprising 150 million real-world queries—and an efficient solution integrating interpretability, hybrid modeling, and dynamic quota allocation. Experiments show that SafeLoad improves precision by up to 66% over the best baseline, reduces wasted CPU time by up to 8.09×, and incurs low online and offline overhead.
📝 Abstract
Memory overload is a common form of resource exhaustion in cloud data warehouses. When database queries fail due to memory overload, it not only wastes critical resources such as CPU time but also disrupts the execution of core business processes, as memory-overloading (MO) queries are typically part of complex workflows. If such queries are identified in advance and scheduled to memory-rich serverless clusters, it can prevent resource wastage and query execution failure. Therefore, cloud data warehouses desire an admission control framework with high prediction precision, interpretability, efficiency, and adaptability to effectively identify MO queries. However, existing admission control frameworks primarily focus on scenarios like SLA satisfaction and resource isolation, with limited precision in identifying MO queries. Moreover, there is a lack of publicly available MO-labeled datasets with workloads for training and benchmarking. To tackle these challenges, we propose SafeLoad, the first query admission control framework specifically designed to identify MO queries. Alongside, we release SafeBench, an open-source, industrial-scale benchmark for this task, which includes 150 million real queries. SafeLoad first filters out memory-safe queries using the interpretable discriminative rule. It then applies a hybrid architecture that integrates both a global model and cluster-level models, supplemented by a misprediction correction module to identify MO queries. Additionally, a self-tuning quota management mechanism dynamically adjusts prediction quotas per cluster to improve precision. Experimental results show that SafeLoad achieves state-of-the-art prediction performance with low online and offline time overhead. Specifically, SafeLoad improves precision by up to 66% over the best baseline and reduces wasted CPU time by up to 8.09x compared to scenarios without SafeLoad.