The High Cost of Keeping Warm: Characterizing Overhead in Serverless Autoscaling Policies

πŸ“… 2025-09-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper exposes a fundamental trade-off between performance and cost in mainstream auto-scaling strategies for serverless computing: frequent instance cold starts and shutdowns incur 10–40% additional CPU overhead, while memory allocation exhibits 2–10Γ— redundancy; existing optimizations often sacrifice significant latency. To address this, the authors develop a reproducible, transparent evaluation frameworkβ€”open-sourcing a system that accurately emulates the control-plane behaviors of AWS Lambda and Google Cloud Run, integrated with real-world deployments and large-scale simulations. This enables the first systematic, quantitative characterization of latency, memory, and CPU overhead under realistic synchronous and asynchronous workloads. Key contributions include: (i) precise identification of auto-scaling efficiency bottlenecks; (ii) formulation of novel, overhead-aware scaling design principles; and (iii) provision of an empirical foundation and methodological guidance for building high-performance, cost-efficient serverless control planes.

Technology Category

Application Category

πŸ“ Abstract
Serverless computing is transforming cloud application development, but the performance-cost trade-offs of control plane designs remain poorly understood due to a lack of open, cross-platform benchmarks and detailed system analyses. In this work, we address these gaps by designing a serverless system that approximates the scaling behaviors of commercial providers, including AWS Lambda and Google Cloud Run. We systematically compare the performance and cost-efficiency of both synchronous and asynchronous autoscaling policies by replaying real-world workloads and varying key autoscaling parameters. We demonstrate that our open-source systems can closely replicate the operational characteristics of commercial platforms, enabling reproducible and transparent experimentation. By evaluating how autoscaling parameters affect latency, memory usage, and CPU overhead, we reveal several key findings. First, we find that serverless systems exhibit significant computational overhead due to instance churn equivalent to 10-40% of the CPU cycles spent on request handling, primarily originating from worker nodes. Second, we observe high memory allocation due to scaling policy: 2-10 times more than actively used. Finally, we demonstrate that reducing these overheads typically results in significant performance degradation in the current systems, underscoring the need for new, cost-efficient autoscaling strategies. Additionally, we employ a hybrid methodology that combines real control plane deployments with large-scale simulation to extend our evaluation closer to a production scale, thereby bridging the gap between small research clusters and real-world environments.
Problem

Research questions and friction points this paper is trying to address.

Analyzing serverless autoscaling performance-cost trade-offs across platforms
Quantifying computational and memory overhead from instance churn
Evaluating parameter impacts on latency and resource efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Designed serverless system mimicking commercial scaling behaviors
Systematically compared synchronous and asynchronous autoscaling policies
Employed hybrid methodology combining real deployments with simulation
πŸ”Ž Similar Papers
No similar papers found.
L
Leonid Kondrashov
NTU, Singapore
B
Boxi Zhou
NTU, Singapore
H
Hancheng Wang
Nanjing University, China
Dmitrii Ustiugov
Dmitrii Ustiugov
NTU Singapore
Cloud computingServerlessSystems for ML