π€ AI Summary
This work addresses the performance bottleneck in serverless computing caused by Kubernetes controllers relying on the API Server for state propagation, which becomes particularly pronounced under bursty workloads. To overcome this limitation, the authors propose an efficient cluster management mechanism tailored for Function-as-a-Service (FaaS) platforms that exploits the prevalent βnarrow waistβ architecture inherent in FaaS systems. By enabling direct inter-controller communication that bypasses the API Server, the approach integrates a hierarchical write-back cache and a decentralized coordination protocol, requiring only approximately 150 lines of code while ensuring eventual consistency and state convergence. Experimental results demonstrate that the proposed system reduces service latency by 26.7Γ compared to Knative, achieving performance on par with the state-of-the-art Dirigent platform, all while maintaining full compatibility with the existing Kubernetes ecosystem.
π Abstract
FaaS platforms rely on cluster managers like Kubernetes for resource management. Kubernetes is popular due to its state-centric APIs that decouple the control plane into modular controllers. However, to scale out a burst of FaaS instances, message passing becomes the primary bottleneck as controllers have to exchange extensive state through the API Server. Existing solutions opt for a clean-slate redesign of cluster managers, but at the expense of compatibility with existing ecosystem and substantial engineering effort. We present KUBEDIRECT, a Kubernetes-based cluster manager for FaaS. We find that there exists a common narrow waist across FaaS platform that allows us to achieve both efficiency and external compatibility. Our insight is that the sequential structure of the narrow waist obviates the need for a single source of truth, allowing us to bypass the API Server and perform direct message passing for efficiency. However, our approach introduces a set of ephemeral states across controllers, making it challenging to enforce end-to-end semantics due to the absence of centralized coordination. KUBEDIRECT employs a novel state management scheme that leverages the narrow waist as a hierarchical write-back cache, ensuring consistency and convergence to the desired state. KUBEDIRECT can seamlessly integrate with Kubernetes, adding ~150 LoC per controller. Experiments show that KUBEDIRECT reduces serving latency by 26.7x over Knative, and has similar performance as the state-of-the-art clean-slate platform Dirigent.