Ksurf-Drone: Attention Kalman Filter for Contextual Bandit Optimization in Cloud Resource Allocation

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Cloud data centers face challenges in containerized infrastructure resource orchestration—including low accuracy and poor stability—due to vast configuration search spaces, high workload variability, and strong environmental noise. To address these, we propose a novel resource orchestration framework integrating Contextual Multi-Armed Bandits (CMAB) with the Ksurf variance-minimization estimator. This work is the first to embed Ksurf into the Drone scheduler and augment it with attention-based Kalman filtering to dynamically suppress nonlinear noise. Implemented within Kubernetes, our approach enables fine-grained, low-overhead, real-time resource configuration optimization. Evaluated on the VarBench benchmark, it reduces p95 and p99 latency variance by 41% and 47%, respectively; decreases CPU utilization by 4%; lowers master-node memory footprint by 7 MB; and reduces average active Pod count by 7%. These improvements significantly enhance resource efficiency and cloud cost-effectiveness.

Technology Category

Application Category

📝 Abstract

Resource orchestration and configuration parameter search are key concerns for container-based infrastructure in cloud data centers. Large configuration search space and cloud uncertainties are often mitigated using contextual bandit techniques for resource orchestration including the state-of-the-art Drone orchestrator. Complexity in the cloud provider environment due to varying numbers of virtual machines introduces variability in workloads and resource metrics, making orchestration decisions less accurate due to increased nonlinearity and noise. Ksurf, a state-of-the-art variance-minimizing estimator method ideal for highly variable cloud data, enables optimal resource estimation under conditions of high cloud variability. This work evaluates the performance of Ksurf on estimation-based resource orchestration tasks involving highly variable workloads when employed as a contextual multi-armed bandit objective function model for cloud scenarios using Drone. Ksurf enables significantly lower latency variance of $41%$ at p95 and $47%$ at p99, demonstrates a $4%$ reduction in CPU usage and 7 MB reduction in master node memory usage on Kubernetes, resulting in a $7%$ cost savings in average worker pod count on VarBench Kubernetes benchmark.

Problem

Research questions and friction points this paper is trying to address.

Optimizing cloud resource allocation for container infrastructure under uncertainty

Reducing latency variance and resource usage in Kubernetes orchestration

Improving contextual bandit decision accuracy in highly variable cloud environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Kalman Filter reduces latency variance

Ksurf method optimizes resource estimation under variability

Contextual bandit model improves cloud orchestration accuracy

🔎 Similar Papers

Cost-Aware Dynamic Cloud Workflow Scheduling using Self-Attention and Evolutionary Reinforcement Learning