🤖 AI Summary
This work addresses the low accuracy and high computational cost of vehicle speed estimation in traffic surveillance videos. We propose a lightweight monocular 3D speed estimation algorithm tailored for edge deployment. Methodologically, we integrate 2D object detection with vanishing-point-based geometric constraints to reconstruct 3D bounding boxes without depth sensors; we further apply model pruning and post-training INT8 quantization to significantly compress the model while preserving geometric consistency. Evaluated on the BrnoCompSpeed dataset, our method achieves a median speed error of only 0.58 km/h, with detection precision and recall exceeding 91%, and inference speed improved by 5.5×. Our key contribution is the first joint optimization of vanishing-point-guided monocular 3D reconstruction and end-to-end quantized deployment—balancing accuracy, robustness, and real-time performance—and delivering a practical, deployable solution for edge-based traffic perception.
📝 Abstract
This paper presents a computationally efficient method for vehicle speed estimation from traffic camera footage. Building upon previous work that utilizes 3D bounding boxes derived from 2D detections and vanishing point geometry, we introduce several improvements to enhance real-time performance. We evaluate our method in several variants on the BrnoCompSpeed dataset in terms of vehicle detection and speed estimation accuracy. Our extensive evaluation across various hardware platforms, including edge devices, demonstrates significant gains in frames per second (FPS) compared to the prior state-of-the-art, while maintaining comparable or improved speed estimation accuracy. We analyze the trade-off between accuracy and computational cost, showing that smaller models utilizing post-training quantization offer the best balance for real-world deployment. Our best performing model beats previous state-of-the-art in terms of median vehicle speed estimation error (0.58 km/h vs. 0.60 km/h), detection precision (91.02% vs 87.08%) and recall (91.14% vs. 83.32%) while also being 5.5 times faster.