The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses inefficient virtual machine (VM) scheduling and resource allocation in enterprise cloud environments. Leveraging real-world operational telemetry from the SAP Cloud Platform—comprising 1,800 physical hosts and 48,000 VMs—we construct and publicly release, for the first time, a fine-grained, full-stack time-series telemetry dataset covering both SAP S/4HANA and general-purpose applications. Using observability tools, we collect infrastructure-level metrics across the entire stack and perform large-scale time-series analysis. Our analysis uncovers critical bottlenecks: CPU contention exceeding 40%, maximum VM ready-time latency reaching 220 seconds, CPU load imbalance affecting 99% of hosts, and sustained CPU utilization below 70% for over 80% of VMs. These findings establish the first enterprise-grade empirical foundation and reproducible dataset to guide the design of adaptive, workload-aware scheduling algorithms grounded in production realities.

Technology Category

Application Category

📝 Abstract
Allocating resources in a distributed environment is a fundamental challenge. In this paper, we analyze the scheduling and placement of virtual machines (VMs) in the cloud platform of SAP, the world's largest enterprise resource planning software vendor. Based on data from roughly 1,800 hypervisors and 48,000 VMs within a 30-day observation period, we highlight potential improvements for workload management. The data was measured through observability tooling that tracks resource usage and performance metrics across the entire infrastructure. In contrast to existing datasets, ours uniquely offers fine-grained time-series telemetry data of fully virtualized enterprise-level workloads from both long-running and memory-intensive SAP S/4HANA and diverse, general-purpose applications. Our key findings include several suboptimal scheduling situations, such as CPU resource contention exceeding 40%, CPU ready times of up to 220 seconds, significantly imbalanced compute hosts with a maximum CPU~utilization on intra-building block hosts of up to 99%, and overprovisioned CPU and memory resources resulting into over 80% of VMs using less than 70% of the provided resources. Bolstered by these findings, we derive requirements for the design and implementation of novel placement and scheduling algorithms and provide guidance to optimize resource allocations. We make the full dataset used in this study publicly available to enable data-driven evaluations of scheduling approaches for large-scale cloud infrastructures in future research.
Problem

Research questions and friction points this paper is trying to address.

Analyzing VM scheduling and placement in SAP's cloud platform
Identifying suboptimal resource allocation and performance issues
Deriving requirements for improved cloud scheduling algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed SAP cloud VM scheduling using fine-grained telemetry data
Identified suboptimal resource allocation and performance bottlenecks
Proposed novel placement algorithms for improved workload management
🔎 Similar Papers
No similar papers found.