🤖 AI Summary
To address real-time, efficiency, and privacy challenges arising from the explosive growth of distributed video analytics in cloud-edge-end collaborative systems, this paper systematically surveys hierarchical, distributed, and hybrid architectural paradigms. It identifies three core design patterns: lightweight edge inference, complex cloud-based understanding, and adaptive hybrid offloading. For the first time, it integrates large language models (LLMs) and multimodal fusion into video analytics, proposing three evolutionary pathways: explainability enhancement, privacy-preserving mechanisms, and system reliability assurance. The work establishes a holistic collaborative analytics framework tailored for smart cities and autonomous driving. It formalizes the CETC (Collaborative, Efficient, Trustworthy, Controllable) design principles and characterizes the fundamental trade-off boundary among performance, latency, and privacy. Finally, it delivers an industrial-grade architecture selection guide and outlines a curated list of open research challenges.
📝 Abstract
The explosive growth of video data has driven the development of distributed video analytics in cloud-edge-terminal collaborative (CETC) systems, enabling efficient video processing, real-time inference, and privacy-preserving analysis. Among multiple advantages, CETC systems can distribute video processing tasks and enable adaptive analytics across cloud, edge, and terminal devices, leading to breakthroughs in video surveillance, autonomous driving, and smart cities. In this survey, we first analyze fundamental architectural components, including hierarchical, distributed, and hybrid frameworks, alongside edge computing platforms and resource management mechanisms. Building upon these foundations, edge-centric approaches emphasize on-device processing, edge-assisted offloading, and edge intelligence, while cloud-centric methods leverage powerful computational capabilities for complex video understanding and model training. Our investigation also covers hybrid video analytics incorporating adaptive task offloading and resource-aware scheduling techniques that optimize performance across the entire system. Beyond conventional approaches, recent advances in large language models and multimodal integration reveal both opportunities and challenges in platform scalability, data protection, and system reliability. Future directions also encompass explainable systems, efficient processing mechanisms, and advanced video analytics, offering valuable insights for researchers and practitioners in this dynamic field.