🤖 AI Summary
In serverless FaaS environments, shared infrastructure induces performance variability across function instances, resulting in increased execution latency and higher time-based billing costs. To address this, we propose Performance-Aware Instance Culling (PAIC), a runtime scheduling mechanism that embeds lightweight benchmarking during function cold starts to detect underperforming instances in real time. PAIC integrates dynamic instance termination with request requeuing, proactively retiring slow instances while reusing previously validated high-performance ones. This work introduces the first “self-terminating instance” paradigm for performance-driven scheduling and enables fine-grained, performance-aware orchestration at the FaaS runtime layer. Experimental evaluation on resource-intensive data-processing workflows demonstrates a 13% acceleration in critical stages, a 4% end-to-end performance improvement, and an equivalent 4% cost reduction. Gains are more pronounced in long-running workflows.
📝 Abstract
Serverless Function-as-a-Service (FaaS) is a popular cloud paradigm to quickly and cheaply implement complex applications. Because the function instances cloud providers start to execute user code run on shared infrastructure, their performance can vary. From a user perspective, slower instances not only take longer to complete, but also increase cost due to the pay-per-use model of FaaS services where execution duration is billed with microsecond accuracy. In this paper, we present Minos, a system to take advantage of this performance variation by intentionally terminating instances that are slow. Fast instances are not terminated, so that they can be re-used for subsequent invocations. One use case for this are data processing and machine learning workflows, which often download files as a first step, during which Minos can run a short benchmark. Only if the benchmark passes, the main part of the function is actually executed. Otherwise, the request is re-queued and the instance crashes itself, so that the platform has to assign the request to another (potentially faster) instance. In our experiments, this leads to a speedup of up to 13% in the resource intensive part of a data processing workflow, resulting in up to 4% faster overall performance (and consequently 4% cheaper prices). Longer and complex workflows lead to increased savings, as the pool of fast instances is re-used more often. For platforms exhibiting this behavior, users get better performance and save money by wasting more of the platforms resources.