EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge

📅 2024-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Edge computing environments face stringent constraints—including limited memory, volatile network conditions, and strict power budgets—that hinder efficient deep neural network (DNN) inference. Method: This paper proposes a model-free inference framework enabling on-demand accuracy and resource-aware dynamic DNN model selection. It introduces a novel edge–cloud collaborative architecture, a confidence-scaling mechanism to compress the candidate model set, and adaptive lossy inference under network fluctuations. Integrated with FPGA-based hardware acceleration and confidence-driven fine-grained scheduling, the framework jointly optimizes energy efficiency and robustness. Contribution/Results: Experiments demonstrate up to 1.6× reduction in P99 latency and a 3.34× decrease in FPGA prototype power consumption at equivalent accuracy, delivering a cost-effective solution for intelligent visual services at the edge.

Technology Category

Application Category

📝 Abstract
Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on modeless inference within data centers, this paper tackles the pressing need for cost-efficient modeless inference at the edge -- particularly within its unique constraints of limited device memory, volatile network conditions, and restricted power consumption. To overcome these challenges, we propose EdgeSight, a system that provides cost-efficient EdgeSight serving for diverse DNNs at the edge. EdgeSight employs an edge-data center (edge-DC) architecture, utilizing confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. Additionally, it supports lossy inference in volatile network environments. Our experimental results show that EdgeSight outperforms existing systems by up to 1.6x in P99 latency for modeless services. Furthermore, our FPGA prototype demonstrates similar performance at certain accuracy levels, with a power consumption reduction of up to 3.34x.
Problem

Research questions and friction points this paper is trying to address.

Edge Computing
Model-Free Inference
Resource Constraint
Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge Computing
Model Selection Optimization
Power Efficiency
🔎 Similar Papers
No similar papers found.