🤖 AI Summary
This work addresses the performance bottlenecks, high resource overhead, and weak isolation inherent in traditional sidecar-based Layer 7 (L7) load balancers within microservice architectures, particularly under high concurrency, long call chains, and co-located deployment scenarios. To overcome these limitations, the authors propose XLB, a novel architecture that offloads L7 load balancing logic into the Linux kernel socket layer using eBPF. XLB introduces a socket redirection mechanism and a nested eBPF map structure to enable efficient connection management and state maintenance, eliminating costly data copies and context switches between user and kernel space. Experimental results demonstrate that, in environments with over 50 microservice instances, XLB achieves up to 1.5× higher throughput and 60% lower end-to-end latency compared to Istio and Cilium.
📝 Abstract
L7 load balancers are a fundamental building block in microservices as they enable fine-grained traffic distribution. Compared to monolithic applications, microservices demand higher performance and stricter isolation from load balancers. This is due to the increased number of instances, longer service chains, and the necessity for co-location with services on the same host. Traditional sidecar-based load balancers are ill-equipped to meet these demands, often resulting in significant performance degradation. In this work, we present XLB, a novel architecture that reshapes L7 load balancers as in-kernel interposition operating on the socket layer. We leverage eBPF to implement the core load balancing logic in the kernel, and address the connection management and state maintenance challenges through novel socket layer redirection and nested eBPF maps designs. XLB eliminates the extra overhead of scheduling, communication, and data movement, resulting in a more lightweight, scalable, and efficient L7 load balancer architecture. Compared to the widely used microservices load balancers (Istio and Cilium), over 50 microservice instances, XLB achieves up to 1.5x higher throughput and 60% lower end-to-end latency.