🤖 AI Summary
To address load imbalance, high communication overhead, and poor hardware heterogeneity adaptability in collaborative DNN inference on edge devices, this paper proposes a flexible combinatorial optimization framework that enables dynamic model partitioning, cross-device reallocation, and topology-aware joint computation-communication scheduling—achieving efficient heterogeneous collaboration without accuracy loss. The method integrates integer linear programming (ILP), lightweight heuristic search, and runtime adaptive scheduling, and is compatible with mainstream edge hardware and inference backends including TensorRT and TVM. Evaluated on a real-world edge cluster, our approach achieves 2.3× average inference speedup, 47% reduction in communication volume, and 61% lower end-to-end latency compared to PipeDream and SplitNN. These results demonstrate significant improvements in both inference efficiency and deployment flexibility for heterogeneous edge inference systems.
📝 Abstract
The rapid advancement of deep learning has catalyzed the development of novel IoT applications, which often deploy pre-trained deep neural network (DNN) models across multiple edge devices for collaborative inference.