Ksurf-Drone: Attention Kalman Filter for Contextual Bandit Optimization in Cloud Resource Allocation

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cloud data centers face challenges in containerized infrastructure resource orchestration—including low accuracy and poor stability—due to vast configuration search spaces, high workload variability, and strong environmental noise. To address these, we propose a novel resource orchestration framework integrating Contextual Multi-Armed Bandits (CMAB) with the Ksurf variance-minimization estimator. This work is the first to embed Ksurf into the Drone scheduler and augment it with attention-based Kalman filtering to dynamically suppress nonlinear noise. Implemented within Kubernetes, our approach enables fine-grained, low-overhead, real-time resource configuration optimization. Evaluated on the VarBench benchmark, it reduces p95 and p99 latency variance by 41% and 47%, respectively; decreases CPU utilization by 4%; lowers master-node memory footprint by 7 MB; and reduces average active Pod count by 7%. These improvements significantly enhance resource efficiency and cloud cost-effectiveness.

Technology Category

Application Category

📝 Abstract
Resource orchestration and configuration parameter search are key concerns for container-based infrastructure in cloud data centers. Large configuration search space and cloud uncertainties are often mitigated using contextual bandit techniques for resource orchestration including the state-of-the-art Drone orchestrator. Complexity in the cloud provider environment due to varying numbers of virtual machines introduces variability in workloads and resource metrics, making orchestration decisions less accurate due to increased nonlinearity and noise. Ksurf, a state-of-the-art variance-minimizing estimator method ideal for highly variable cloud data, enables optimal resource estimation under conditions of high cloud variability. This work evaluates the performance of Ksurf on estimation-based resource orchestration tasks involving highly variable workloads when employed as a contextual multi-armed bandit objective function model for cloud scenarios using Drone. Ksurf enables significantly lower latency variance of $41%$ at p95 and $47%$ at p99, demonstrates a $4%$ reduction in CPU usage and 7 MB reduction in master node memory usage on Kubernetes, resulting in a $7%$ cost savings in average worker pod count on VarBench Kubernetes benchmark.
Problem

Research questions and friction points this paper is trying to address.

Optimizing cloud resource allocation for container infrastructure under uncertainty
Reducing latency variance and resource usage in Kubernetes orchestration
Improving contextual bandit decision accuracy in highly variable cloud environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Kalman Filter reduces latency variance
Ksurf method optimizes resource estimation under variability
Contextual bandit model improves cloud orchestration accuracy
🔎 Similar Papers
2024-09-27International Conference on Service Oriented ComputingCitations: 0
M
Michael Dang'ana
Electrical & Computer Engineering, University of Toronto
Y
Yuqiu Zhang
Electrical & Computer Engineering, University of Toronto
Hans-Arno Jacobsen
Hans-Arno Jacobsen
Professor of Computer Engineering and Computer Science
data managementmiddlewaredistributed systemsevent processingblockchains