🤖 AI Summary
This work addresses the challenge of accurately identifying performance bottlenecks caused by shared resource contention in multi-tenant, hardware-heterogeneous environments, where traditional metrics often fall short. The authors propose a novel abstraction termed “buoyancy,” which uniquely unifies application-level performance headroom with system-wide resource contention into a single, cross-platform, and scalable performance characterization framework. By integrating application monitoring with system-level resource competition analysis, the approach enables coordinated identification of bottlenecks across shared CPU, memory, and I/O resources. Experimental evaluation under representative multi-tenant workloads demonstrates that the buoyancy metric improves bottleneck identification accuracy by 19.3% on average compared to conventional heuristic methods. Furthermore, the metric is designed for seamless integration into existing scheduling systems, thereby enhancing orchestration decisions without requiring significant architectural modifications.
📝 Abstract
Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU utilization or application-level metrics, are often insufficient to capture the complex performance dynamics arising from resource contention and noisy-neighbor effects. In such environments, performance bottlenecks may emerge in any shared system resource, leading to unexpected and difficult-to-diagnose degradation.
This paper introduces buoyancy, a novel abstraction for characterizing workload performance in multi-tenant systems. Unlike traditional approaches, buoyancy integrates application-level metrics with system-level insights of shared resource contention to provide a holistic view of performance dynamics. By explicitly capturing bottlenecks and headroom across multiple resources, buoyancy facilitates resource-aware and application-aware orchestration in a manner that is intuitive, extensible, and generalizable across heterogeneous platforms. We evaluate buoyancy using representative multi-tenant workloads to illustrate its ability to expose performance-limiting resource interactions. Buoyancy provides a 19.3% better indication of bottlenecks compared to traditional heuristics on average. We additionally show how buoyancy can act as a drop-in replacement for conventional performance metrics, enabling improved observability and more informed scheduling and optimization decisions.