🤖 AI Summary
Edge computing environments face stringent constraints—including limited memory, volatile network conditions, and strict power budgets—that hinder efficient deep neural network (DNN) inference. Method: This paper proposes a model-free inference framework enabling on-demand accuracy and resource-aware dynamic DNN model selection. It introduces a novel edge–cloud collaborative architecture, a confidence-scaling mechanism to compress the candidate model set, and adaptive lossy inference under network fluctuations. Integrated with FPGA-based hardware acceleration and confidence-driven fine-grained scheduling, the framework jointly optimizes energy efficiency and robustness. Contribution/Results: Experiments demonstrate up to 1.6× reduction in P99 latency and a 3.34× decrease in FPGA prototype power consumption at equivalent accuracy, delivering a cost-effective solution for intelligent visual services at the edge.
📝 Abstract
Traditional ML inference is evolving toward modeless inference, which abstracts the complexity of model selection from users, allowing the system to automatically choose the most appropriate model for each request based on accuracy and resource requirements. While prior studies have focused on modeless inference within data centers, this paper tackles the pressing need for cost-efficient modeless inference at the edge -- particularly within its unique constraints of limited device memory, volatile network conditions, and restricted power consumption. To overcome these challenges, we propose EdgeSight, a system that provides cost-efficient EdgeSight serving for diverse DNNs at the edge. EdgeSight employs an edge-data center (edge-DC) architecture, utilizing confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. Additionally, it supports lossy inference in volatile network environments. Our experimental results show that EdgeSight outperforms existing systems by up to 1.6x in P99 latency for modeless services. Furthermore, our FPGA prototype demonstrates similar performance at certain accuracy levels, with a power consumption reduction of up to 3.34x.