🤖 AI Summary
A fundamental tension exists between resource constraints on edge devices and the high communication overhead of cloud-based inference, making it challenging to jointly optimize computation, communication, and inference performance. To address this, we propose an edge–cloud collaborative inference framework centered on a knowledge-adaptive mechanism that enables bidirectional knowledge transfer for joint compression and co-optimization of lightweight edge models and powerful cloud models. Methodologically, the framework integrates knowledge distillation with model adaptation to construct a dynamically adjustable hierarchical inference architecture. Evaluated on image classification and object detection tasks, it achieves state-of-the-art (SOTA) accuracy while significantly reducing edge-side computational load (average reduction of 42%) and edge-to-cloud data transmission volume (up to 68% compression). These improvements effectively alleviate the multi-objective trade-offs inherent in edge–cloud systems.
📝 Abstract
The massive growth in the utilization of edge AI has made the applications of machine learning models ubiquitous in different domains. Despite the computation and communication efficiency of these systems, due to limited computation resources on edge devices, relying on more computationally rich systems on the cloud side is inevitable in most cases. Cloud inference systems can achieve the best performance while the computation and communication cost is dramatically increasing by the expansion of a number of edge devices relying on these systems. Hence, there is a trade-off between the computation, communication, and performance of these systems. In this paper, we propose a novel framework, dubbed as Eccentric that learns models with different levels of trade-offs between these conflicting objectives. This framework, based on an adaptation of knowledge from the edge model to the cloud one, reduces the computation and communication costs of the system during inference while achieving the best performance possible. The Eccentric framework can be considered as a new form of compression method suited for edge-cloud inference systems to reduce both computation and communication costs. Empirical studies on classification and object detection tasks corroborate the efficacy of this framework.