FailLite: Failure-Resilient Model Serving for Resource-Constrained Edge Environments

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address service interruptions in deep learning inference at resource-constrained edge devices caused by hardware failures—and the infeasibility of conventional full-replica fault tolerance—this paper proposes a lightweight heterogeneous fault-tolerant inference framework. Our approach introduces three key innovations: (1) a heterogeneous replication mechanism that employs compact, functionally equivalent surrogate models instead of full-model replicas; (2) coordinated warm/cold replica scheduling with progressive failover to minimize disruption; and (3) an edge-deployable prototype system integrating dynamic state management and tiered fault response. Evaluated across 27 mainstream models, our framework achieves an average recovery time of 175.5 ms and incurs only 0.6% accuracy degradation—significantly outperforming traditional replication schemes. The solution thus delivers high availability, low resource overhead, and rapid recovery, making it well-suited for edge AI deployments.

Technology Category

Application Category

📝 Abstract
Model serving systems have become popular for deploying deep learning models for various latency-sensitive inference tasks. While traditional replication-based methods have been used for failure-resilient model serving in the cloud, such methods are often infeasible in edge environments due to significant resource constraints that preclude full replication. To address this problem, this paper presents FailLite, a failure-resilient model serving system that employs (i) a heterogeneous replication where failover models are smaller variants of the original model, (ii) an intelligent approach that uses warm replicas to ensure quick failover for critical applications while using cold replicas, and (iii) progressive failover to provide low mean time to recovery (MTTR) for the remaining applications. We implement a full prototype of our system and demonstrate its efficacy on an experimental edge testbed. Our results using 27 models show that FailLite can recover all failed applications with 175.5ms MTTR and only a 0.6% reduction in accuracy.
Problem

Research questions and friction points this paper is trying to address.

Enables failure-resilient model serving in resource-constrained edge environments
Uses heterogeneous replication with smaller failover models to save resources
Provides quick failover for critical apps with low accuracy reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous replication with smaller failover models
Intelligent warm and cold replica usage
Progressive failover for low recovery time
🔎 Similar Papers
No similar papers found.