🤖 AI Summary
Ensuring joint reliability and elasticity for mission-critical, low-latency service chains remains challenging due to conflicting requirements between sustained dependable operation and rapid post-failure recovery.
Method: This paper proposes a unified stochastic network control framework that jointly models nominal reliability and post-failure recovery agility. It formulates reliability and elasticity as a stochastic optimization problem subject to both long-term (availability) and short-term (recovery time) latency constraints, and designs the Multi-traffic-flow Elastic Reliable Cloud Network Control (MC-ResRCNC) algorithm for dynamic resource orchestration across temporal scales.
Contribution/Results: Experiments demonstrate that MC-ResRCNC significantly improves timely throughput under normal operation and reduces end-to-end recovery time by 58% under node or link failures. It outperforms state-of-the-art approaches in both reliability and network resilience. This work establishes a verifiable, deployable elastic control paradigm for latency-sensitive service chains.
📝 Abstract
The proliferation of mission-critical latency-sensitive services has intensified the demand for next-generation cloud-integrated networks to guarantee both reliable and resilient service delivery. While reliability imposes timely-throughput requirements, i.e., percentage of packets to be delivered within a prescribed per-packet deadline, resilience relates to the network's ability to swiftly recover timely-throughput performance following an outage event, such as node or link failures. While recent studies have increasingly focused on designing reliable network control policies, a comprehensive solution that combines reliable and resilient network control has yet to be fully explored. This paper formulates the multi-commodity least-cost resilient and reliable network control (MC-LC-ResRNC) problem as a stochastic control problem with long and short-term timely throughput constraints. We then present a solution through the Multi-Commodity Resilient and Reliable Cloud Network Control (MC-ResRCNC) algorithm and show through numerical experiments that it jointly ensures reliability under normal conditions and resilience upon network failure.