Morpheus: Lightweight RTT Prediction for Performance-Aware Load Balancing

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

To address high tail latency caused by resource contention in edge-cloud collaborative environments, this paper proposes a performance-aware load balancing method based on round-trip time (RTT) prediction. Unlike conventional reactive strategies, our approach employs a lightweight time-series forecasting model to enable proactive request scheduling. We formally quantify, for the first time, the minimum required prediction accuracy threshold and identify key system-level factors influencing scheduling efficacy. Leveraging monitoring data from a Kubernetes-managed GPU cluster, we construct a compact feature set exploiting strong metric correlations, achieving 95% prediction accuracy while incurring less than 10% overhead relative to application RTT. Experimental results demonstrate significant reductions in end-to-end and tail latency, improved resource utilization, and robust compatibility with multi-tenant co-location scenarios and heterogeneous hardware infrastructures.

Technology Category

Application Category

📝 Abstract

Distributed applications increasingly demand low end-to-end latency, especially in edge and cloud environments where co-located workloads contend for limited resources. Traditional load-balancing strategies are typically reactive and rely on outdated or coarse-grained metrics, often leading to suboptimal routing decisions and increased tail latencies. This paper investigates the use of round-trip time (RTT) predictors to enhance request routing by anticipating application latency. We develop lightweight and accurate RTT predictors that are trained on time-series monitoring data collected from a Kubernetes-managed GPU cluster. By leveraging a reduced set of highly correlated monitoring metrics, our approach maintains low overhead while remaining adaptable to diverse co-location scenarios and heterogeneous hardware. The predictors achieve up to 95% accuracy while keeping the prediction delay within 10% of the application RTT. In addition, we identify the minimum prediction accuracy threshold and key system-level factors required to ensure effective predictor deployment in resource-constrained clusters. Simulation-based evaluation demonstrates that performance-aware load balancing can significantly reduce application RTT and minimize resource waste. These results highlight the feasibility of integrating predictive load balancing into future production systems.

Problem

Research questions and friction points this paper is trying to address.

Predicting RTT for low-latency load balancing

Developing lightweight predictors using Kubernetes monitoring data

Identifying accuracy thresholds for effective cluster deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight RTT predictors trained on time-series data

Uses highly correlated metrics for low overhead adaptability

Achieves 95% accuracy with minimal prediction delay

🔎 Similar Papers

No similar papers found.