🤖 AI Summary
Existing surveys on traffic surveillance systems (TSS) are fragmented, lacking a unified framework that bridges low-level perception (e.g., detection, tracking) and high-level understanding (e.g., parameter estimation, anomaly detection, behavior interpretation), while overlooking paradigm shifts driven by foundation models. Method: We propose the first holistic “low-level → high-level” visual perception framework for TSS, systematically identifying and analyzing five fundamental bottlenecks: perceptual degradation, strong data dependency, weak semantic understanding, limited coverage, and high computational cost. Through comprehensive evaluation of foundation models—including ViT, CLIP, and diffusion models—in zero-shot learning, cross-modal semantic alignment, and scene generation, we construct a structured technology roadmap. Contribution/Results: Our work provides cross-task benchmarking, clarifies key challenges and viable integration pathways for foundation models in TSS, and delivers both theoretical foundations and practical guidelines for next-generation intelligent traffic perception systems.
📝 Abstract
Traffic Surveillance Systems (TSS) have become increasingly crucial in modern intelligent transportation systems, with vision-based technologies playing a central role for scene perception and understanding. While existing surveys typically focus on isolated aspects of TSS, a comprehensive analysis bridging low-level and high-level perception tasks, particularly considering emerging technologies, remains lacking. This paper presents a systematic review of vision-based technologies in TSS, examining both low-level perception tasks (object detection, classification, and tracking) and high-level perception applications (parameter estimation, anomaly detection, and behavior understanding). Specifically, we first provide a detailed methodological categorization and comprehensive performance evaluation for each task. Our investigation reveals five fundamental limitations in current TSS: perceptual data degradation in complex scenarios, data-driven learning constraints, semantic understanding gaps, sensing coverage limitations and computational resource demands. To address these challenges, we systematically analyze five categories of potential solutions: advanced perception enhancement, efficient learning paradigms, knowledge-enhanced understanding, cooperative sensing frameworks and efficient computing frameworks. Furthermore, we evaluate the transformative potential of foundation models in TSS, demonstrating their unique capabilities in zero-shot learning, semantic understanding, and scene generation. This review provides a unified framework bridging low-level and high-level perception tasks, systematically analyzes current limitations and solutions, and presents a structured roadmap for integrating emerging technologies, particularly foundation models, to enhance TSS capabilities.