Signalling Health for Improved Kubernetes Microservice Availability

πŸ“… 2025-07-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing polling-based container health monitoring (PCM) methods in Kubernetes suffer from complex parameter tuning, high fault detection latency (slow average response), and frequent false positives that degrade service availability. Method: We propose the first signal-driven health monitoring framework empirically validated in Kubernetes, replacing periodic polling with an event-notification mechanism and formalizing its performance advantages via a rigorous mathematical model. The approach requires no manual configuration. Results: Our method accelerates fault detection by 86%, eliminates false positives entirely, and prevents a 4% availability loss attributable to erroneous health assessments. It achieves readiness probe accuracy comparable to polling while reducing resource overhead. Evaluated on the SockShop benchmark across six comparative experiments, the framework is fully integrated into Kubernetes’ native probe infrastructure, significantly enhancing end-to-end availability of microservice systems.

Technology Category

Application Category

πŸ“ Abstract
Microservices are often deployed and managed by a container orchestrator that can detect and fix failures to maintain the service availability critical in many applications. In Poll-based Container Monitoring (PCM), the orchestrator periodically checks container health. While a common approach, PCM requires careful tuning, may degrade service availability, and can be slow to detect container health changes. An alternative is Signal-based Container Monitoring (SCM), where the container signals the orchestrator when its status changes. We present the design, implementation, and evaluation of an SCM approach for Kubernetes and empirically show that it has benefits over PCM, as predicted by a new mathematical model. We compare the service availability of SCM and PCM over six experiments using the SockShop benchmark. SCM does not require that polling intervals are tuned, and yet detects container failure 86% faster than PCM and container readiness in a comparable time with limited resource overheads. We find PCM can erroneously detect failures, and this reduces service availability by 4%. We propose that orchestrators offer SCM features for faster failure detection than PCM without erroneous detections or careful tuning.
Problem

Research questions and friction points this paper is trying to address.

Compares SCM and PCM for Kubernetes microservice availability
Evaluates faster failure detection without tuning in SCM
Addresses erroneous failure detection and resource overhead in PCM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Signal-based Container Monitoring for Kubernetes
Faster failure detection than polling
No tuning needed, reduces erroneous detections
πŸ”Ž Similar Papers
No similar papers found.