🤖 AI Summary
To address the accuracy–latency–computational-load trade-off in video object recognition for resource-constrained devices (e.g., traffic cameras) in mobile edge networks, this paper proposes LTED-Ada—a novel adaptive framework. Methodologically, it introduces the first deep reinforcement learning (DRL)-based policy for dynamically switching between lightweight local tracking and high-accuracy edge detection; further, it integrates federated learning to enable collaborative, privacy-preserving policy training across heterogeneous devices, thereby enhancing cross-scenario generalization. The approach unifies lightweight tracking, neural detection, edge offloading, and distributed learning, and is validated via hardware-in-the-loop experiments. Results demonstrate that, under diverse frame rates and performance constraints, LTED-Ada significantly reduces end-to-end latency (average reduction of 32.7%) while improving recognition accuracy (average gain of 8.4%) over baseline methods. The framework exhibits strong practicality and scalability for real-world edge intelligence deployments.
📝 Abstract
Fast and accurate video object recognition, which relies on frame-by-frame video analytics, remains a challenge for resource-constrained devices such as traffic cameras. Recent advances in mobile edge computing have made it possible to offload computation-intensive object detection to edge servers equipped with high-accuracy neural networks, while lightweight and fast object tracking algorithms run locally on devices. This hybrid approach offers a promising solution but introduces a new challenge: deciding when to perform edge detection versus local tracking. To address this, we formulate two long-term optimization problems for both single-device and multi-device scenarios, taking into account the temporal correlation of consecutive frames and the dynamic conditions of mobile edge networks. Based on the formulation, we propose the LTED-Ada in single-device setting, a deep reinforcement learning-based algorithm that adaptively selects between local tracking and edge detection, according to the frame rate as well as recognition accuracy and delay requirement. In multi-device setting, we further enhance LTED-Ada using federated learning to enable collaborative policy training across devices, thereby improving its generalization to unseen frame rates and performance requirements. Finally, we conduct extensive hardware-in-the-loop experiments using multiple Raspberry Pi 4B devices and a personal computer as the edge server, demonstrating the superiority of LTED-Ada.