Hestia: Hyperthread-Level Scheduling for Cloud Microservices with Interference-Aware Attention

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

267K/year

🤖 AI Summary

This work addresses performance interference at the hyperthreading level caused by co-locating multiple microservices on cloud servers, a challenge poorly handled by existing schedulers due to their inability to capture hyperthread heterogeneity and dynamic SMT contention. The authors propose Hestia, a novel scheduling framework that, for the first time, identifies and models the strong asymmetry between two contention patterns—shared-core (SC) and shared-socket (SS)—and incorporates a self-attention mechanism to enable fine-grained interference-aware placement decisions. Leveraging tens of thousands of production instances, Hestia builds CPU usage prediction and interference scoring models to optimize microservice placement. Experimental results demonstrate that Hestia significantly outperforms five state-of-the-art schedulers across diverse contention scenarios, achieving up to a 30.65% performance gain, reducing 95th-percentile tail latency by as much as 80%, and lowering total CPU consumption by 2.3% under identical workloads.

Technology Category

Application Category

📝 Abstract

Modern cloud servers routinely co-locate multiple latency-sensitive microservice instances to improve resource efficiency. However, the diversity of microservice behaviors, coupled with mutual performance interference under simultaneous multithreading (SMT), makes large-scale placement increasingly complex. Existing interference aware schedulers and isolation techniques rely on coarse core-level profiling or static resource partitioning, leaving asymmetric hyperthread-level heterogeneity and SMT contention dynamics largely unmodeled. We present Hestia, a hyperthread-level, interference-aware scheduling framework powered by self-attention. Through an extensive analysis of production traces encompassing 32,408 instances across 3,132 servers, we identify two dominant contention patterns -- sharing-core (SC) and sharing-socket (SS) -- and reveal strong asymmetry in their impact. Guided by these insights, Hestia incorporates (1) a self-attention-based CPU usage predictor that models SC/SS contention and hardware heterogeneity, and (2) an interference scoring model that estimates pairwise contention risks to guide scheduling decisions. We evaluate Hestia through large-scale simulation and a real production deployment. Hestia reduces the 95th-percentile service latency by up to 80\%, lowers overall CPU consumption by 2.3\% under the same workload, and surpasses five state-of-the-art schedulers by up to 30.65\% across diverse contention scenarios.

Problem

Research questions and friction points this paper is trying to address.

microservices

simultaneous multithreading

performance interference

hyperthread-level scheduling

resource efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

hyperthread-level scheduling

interference-aware attention

simultaneous multithreading (SMT)