🤖 AI Summary
This work addresses the challenge of elevated latency during traffic bursts in virtual machine (VM)-based microservices, where slow scaling leads to performance degradation, while over-provisioning incurs unnecessary costs. To reconcile this trade-off, the authors propose Flare, a hybrid architecture that selectively offloads only the excess requests from overloaded nodes in chained microservices to a serverless platform, without requiring any application code modifications. Flare synergistically combines VMs for handling steady-state workloads with serverless computing for absorbing transient peaks, seamlessly integrating with existing auto-scaling mechanisms and serverless infrastructure. Experimental evaluation demonstrates that Flare achieves low-latency responses under bursty loads while substantially reducing resource idling and operational expenses.
📝 Abstract
Online services strive to maintain application responsiveness even when the traffic is unpredictable and fluctuating. Today's online services are commonly deployed as chains of microservices, each microservice packaged as one or more containers inside virtual machines (VMs). While performant and affordable when the load is steady, VM-based deployments are known to be slow to scale when the load spikes, resulting in degraded performance for end-users of the service. To avoid such performance degradations, service providers can over-provision their deployments; however, such a strategy is costly and inefficient, leaving resources under-utilized for extended periods.
To address the challenge of unpredictable load spikes, we propose Flare, a hybrid microservice architecture that combines VMs with serverless computing. Flare utilizes VMs to cost-effectively handle steady workloads and leverages serverless elasticity to absorb traffic spikes. When a spike occurs, Flare detects which specific service(s) are overloaded and shifts the excess load of only those services to serverless, thus minimizing the cost overhead. Flare seamlessly integrates into existing auto-scaling and serverless infrastructure, requiring minimal changes to the control plane and no modifications to the application.